Transactions 
on INFORMATION THEORY 


Journal Devoted to the Theoretical and Experimental Aspects of Information Transmission, Processing and Utilization. 


Volante 11-7 JULY, 1961 fife, Number 3 


Published Quarterly 


In This Issue 


Progress in Information Theory in the U. S. A., 1957-1960 


On the Approach of a Filtered Pulse Train to a Stationary Gaussian Process 


The Axis Crossings of a Stationary Gaussian Markov Process 


On Optimal Diversity Reception 


A New Derivation of the Entropy Expressions 


The Use of Group Codes in Error Detection and Message Retransmission 


On the Factorization of Rational Matrices 


PUBLISHED BY THE 
Professional Group on Information Theory 


IRE Professional Group on Information Theory 


The Professional Group on Information Theory is an organization, within the framework of the IRE, of 
members with principal professional interest in Information Theory. All members of the IRE are eligible 
for membership in the Group and will receive all Group publications upon payment of an annual fee of 
$4.00. 


ADMINISTRATIVE COMMITTEE 


P. E. Green, Jr. (’63), Chairman 


M.1.T, Lincoln Laboratory 
Lexington, Mass. 


N. M. Abramson (’63) 
Elec. Engrg. Dept. 
Stanford University 
Stanford, Calif. 


T. P. Cheatham, Jr. (’62) 
Litton Industries, Inc. 
Beverly Hills, Calif. 


G. A. Deschamps (62) 
University of Illinois 
Urbana, III. 


D. A. Huffman (63) 
Mass, Inst. Tech. 
Cambridge, Mass. 


Hughes Research Labs. 
Malibu, Calif. 


J. L. Kelly, Jr. (’63) 
Bell Telephone Labs., Inc. 
Murray Hill, N. J. 


Ernest R. Kretzmer (’62) 
Bell Telephone Labs., Inc. 
Murray Hill, N. J. 


W. Wesley Peterson (’64) 
Univ. of Florida 
Gainsville, Fla. 


Mischa Schwartz (’64) 
Polytechnic Inst. of Brooklyn 
Brooklyn, N. Y. 


R. A. Silverman (’63) 
147-15 Village Road 


G. L. Turin (62), Vice Chairman A. G. Schillinger (’61), Secretary-Treasurer 
Polytechnic Institute of Brooklyn 


Brooklyn, N. Y. 


F, L. H. M. Stumpers (62) 
Research Laboratories 

N. V. Philips 
Gloeilampefabrieken 
Eindhoven, Netherlands 


Peter Swerling (’64) 
Rand Corp. 
Santa Monica, Calif. 


David Van Meter (’64) 
Litton Industries, Inc. 
Waltham, Mass. 


L. A. Zadeh (’64) 
University of California 
Berkeley, Calif. 


Jamaica, N. Y, 


TRANSACTIONS 


A. Kohlenberg, Editor A. Nuttall, Associate Editor 
Melpar, Inc. Litton Industries, Inc. 
Watertown, Mass. Waltham, Mass. 


P. E. Green, Jr. Peter Elias 
Editorial Policy Committee Editorial Policy Committee 
M.I.T. Lincoln Laboratory Mass. Inst. Tech. 

Lexington, Mass, - Cambridge, Mass. 


L. A. Zadeh 
Editor for Special Papers 
University of California 
Berkeley, Calif. 

IRE Transactions® on InForMATION THEORY is published in January, April, July, and October, by the 
IRE for the Professional Group on Information Theory, at 1 East 79 Street, New York 21, N. Y. In addi- 
tion to these regular quarterly issues, Special Issues appear from time to time. Responsibility for contents 
rests upon the authors and not upon the IRE, the Group, or its members. Individual copies of this issue and 
all available back issues, except PGIT-4, may be purchased at the following prices: IRE members (one 
copy) $2.25, libraries and colleges $3. 25, all otto $4. 50. Annual subscription rate: non-members $17.00; 
libraries ath colleges, $12.75. 


INFORMATION THEORY 
Copyright © 1961—Tue Institute or Rapio ENcrnerrs, INC. 
Printep 1n U.S.A. 


All rights, including translation, are reserved by the IRE. Requests for republication privi- 
leges should be addressed to the Institute of Radio Engineers, 1 E. 79 St., New York 21, N. Y. 


IRE Transactions 
on 
Information Theory 


A Journal Devoted to the Theoretical and Experimental 
Aspects of Information Transmission, Processing and Utilization 


Volume IT-7 July, 1960 Number 3 


Published Quarterly 


TABLE OF CONTENTS 


Contributions PaGgr 


Progress in Information Theory in the U.S.A., 1957-1960 
P. Elias, A. Gill, R. Price, P. Swerling, L. Zadeh, and N. Abramson 128 


On the Approach of a Filtered Pulse Train to a Stationary Gaussian Process Phillip Bello 144 


The Axis Crossings of a Stationary Gaussian Markov Process J.A. McFadden 150 
On Optimal Diversity Reception George L. Turin 154 
A New Derivation of the Entropy Expressions S.W.Golomb 166 
The Use of Group Codes in Error Detection and Message Retransmission W.R. Cowell 168 
On the Factorization of Rational Matrices Dante C. Youla 172 
Correspondence 
Noise in an Amplitude Selective Detector William M. Waters 190 
A Frequency-Weighted Mean-Square Error Criterion Daniel S. Ruchkin 192 


Information Theory and the Separability of Signals with Overlapping Spectra 
L. Lorne Campbell 193 


On the Approximation to Likelihood Ratio Detectors Laws (The Threshold Case) 
Herman Blasbalg 194 
Contributors 196 


Abstracts 197 


Book Reviews 205 


128 


Progress in Information Theory in the U.S.A, 1957-1960! 


P. ELIAS, A>GILL, R. PRICE, N. ABRAMSON, P. SWERLING, ann L. ZADEH 


This is the first in a series of invited tutorial, status and survey papers that will be 
provided from time to time by the PGIT Committee on Special Papers, whose Chairman 
is currently L. A. Zadeh. Hopefully these papers will fill a gap that we have long felt existed 
in our publication program. In the past, there has been no formal method, short of entire 
Special or Monograph Issues, of providing basic introductory material or surveys of por- 
tions of the information theory field.—The Administrative Committee. 


INTRODUCTION 


The following report comprises five parts. Part 1 is~ 


concerned with contributions centering on Shannon’s 
theory and the theory of coding. Part 2 deals with those 
results in the theory of random processes which are of 
relevance to communication problems. Part 3 surveys 
advances of a basic nature in pattern recognition. Part 4 
is concerned primarily with the detection of signals in 
noise. Part 5 is given over to prediction and filtering, 
centering on Wiener’s theory and its extensions. 


PART 1: INFORMATION THEORY AND CODING 


P. ELIAS}, FELLOW, IRE 


INCE 1957, there has been considerable progress in 
S the theory of coding messages for transmission 

over noisy channels. There have been three main 
directions of advance. First, there has been work on 
the foundations of the theory. During this time, Ameri- 
can mathematicians interested in probability have shown 
a serious interest in information theory, especially 
since Feinstein’s work (now available in book form [12]), 
and since the interest shown by Kolmogorov and Khinchin. 
Second, a great deal of work has been done on error- 
correcting block codes for noisy binary channels. This 
work has involved a good deal of modern algebra, and 
some mathematical algebraists have been joining the 
communications research workers in attacking these 
problems. Third, there has been continuing investigation 
of procedures in which input messages are coded and 
decoded sequentially rather than in long blocks. This 


* This work was supported partially by the US Army Signal 
Corps., the AF Office of Scientific Research and the Office of Naval 
Research. The material is part of a Commission 6 progress report 
prepared for the U.S.A. National Committee of URSI (International 
Scientific Radio Union) for presentation at the XIII‘ General 
Assembly of URSI in London, Eng., September 5-15, 1960. Text 
of a report by L. Weinberg on circuit theory, an earlier version of 
the present one, and reports by other commissions of URSI, may 
be found in the November-January, 1960, issue of the J. Res. NBS. 

+ Dept. of Elec. Engrg. and Research Lab. of Electronics, Mass. 
Inst. Tech., Cambridge, Mass. 


IRE TRANSACTIONS ON INFORMATION THEORY 


July 


work and the work on binary block codes both have 
significant practical implications for electrical communi- 
cations. 


FOUNDATIONS 


Shannon’s original demonstration of the noisy-channel 
coding theorem was an existence proof [31]. Given 
channel of capacity C bits per second, and a rate of 
transmission, R bits per second, the transmitter sends 
sequences of N channel input symbols. The receiver 
receives sequences of N channel output symbols and 
decides which input sequence was transmitted, making 
this decision incorrectly with probability P. What Shan- 
non showed was that for R < C, P could be made ar. 
bitrarily small by increasmg N. The proof was not 
constructive, and nothing quantitative was said abow 
how rapidly P decreased as a function of N for give 
R and C. Feinstein [11], [12] showed that P could be 
bounded by a decaying exponential in N. His proof 
covered channels with a simple kind of finite memory. 
While constructive in principle, it could not be used in 
practice to construct a code with large N. In 1957, 
Shannon [32] gave a remarkably concise proof based on 
his original random coding argument, but more detailed 
and precise; this also gave an exponential bound to P 
as a function of N, and extended the proof to channels 
with considerably more complex memory. Blackwell. 
Breimann, and Thomasian [2] proved the existence 
theorem for channels with a finite-state memory of a 
still more general kind. Wolfowitz [40] and Feinstein [13 
have also proved converse theorems—the weak converse 
being that for R > C, P cannot approach zero, and the 
strong converse being that for R > C, P must approach 1. 

The kind of technique used by Shannon [32] can be 
extended to obtain upper and lower bounds to the rate 
of exponential decay of P with N. Earlier work on binary 
channels had shown that for a considerable range of R 
less than C, the upper and lower bounds essentially 
agreed, and the best possible behavior could be uniquely 
specified. Similar results have been obtained by Shannor 
for more general channels. This work is not yet published 


t the case of a continuous channel with additive Gaus- 
m noise has been treated in detail [34]. 

he increasing interest of mathematicians in this field 
videnced by an article by Wolfowitz [39]. In general, 
results which the mathematicians have obtained are 
er proofs, under more general circumstances, of 
orems whose general character was not surprising to 
mmunications researchers. However, a recent paper [3] 
‘s presented an interesting new problem, defining 
pacity and proving a coding theorem for a channel 
hose parameters are not known precisely, but are con- 
rained to lie in known ranges. This work might be 
fevant to incompletely measured and time-varying 
dio channels. So might a paper by Shannon [33] on 
‘annels, in which the transmitter has side information 
‘ailable about the state of a channel with memory: 
i example would be the information obtained by measure- 
ents of the propagation medium obtained while com- 
unicating. 


Binary CHANNELS 


‘Starting with the earlier work of Hamming [16] and 
epian [35], [36], error-correcting block codes for binary 
sannels have been investigated extensively. Peterson 
nd Fontaine [24] have searched for best possible error- 
srrecting codes of short block length (up to 29), using 
computer. The number of codes grows so rapidly with 
ock length that it was necessary to use many equiva- 
nce relations and short cut tests to eliminate codes from 
msideration early. A number of counter-examples were 
und to common conjectures about optimum codes. 

| The use of error-correcting codes, in practice, has been 
nited by the difficulty of implementation, and by the 
cet that in many applications of interest, the errors 
the channel are not independent, but occur in runs or 
arsts. In an earlier work, Huffman [17] had shown a 
nding and decoding procedure for the Hamming code 
hich was simple to implement, and Green and San 
sucie [14] have shown an easy implementation for a 
ort multiple-error-correcting code. Hagelbarger [15] 
as described codes in which correct errors occurr in 
ursts and whose implementation is not too difficult; 
bramson [1] has described a highly efficient and easily 
nplemented set of codes with similar properties. 

Work on codes of longer length, which can correct 
ultiple errors, started with a decoding procedure given 
y Reed [28] some time ago for the Reed-Muller family of 
pes. For really large block lengths, these codes are 
ot efficient; but Perry [23] has built a coder and decoder 
yr a Reed-Muller code which has block length of 128 
igits, 64 of which are information digits and 64 check 
igits. This code can correct any set of 7 or fewer errors 
mong the group of 128, and the efficiency is quite good. 
sing microsecond switching devices, the units can keep 
p with millisecond binary digits. 

Calabi and Haefeli [6] have investigated in detail the 
urst correcting properties of a family of codes which 
ad been introduced earlier for correction of independent 


pl Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 


129 


errors [7]. They also discuss the implementation of these 
codes. 

A new family of codes discovered by Bose and Ray- 
Chaudhuri [4], [5] is much more efficient than the Reed- 
Muller codes for large block lengths. Although in the 
limit of infinite block length, these codes may also have 
zero efficiency, at lengths of a few thousand digits they 
are still quite good. Peterson [25] has discovered an 
economical way to decode these codes. There is a great 
deal of current work on finding more properties of these 
codes, finding similar codes for channels which are sym- 
metric but not binary, and so forth. 

There has been a good deal of recent work on cyclic 
codes, including some encouraging results on step-by-step 
decoding due to Prange [27]. Cyclic codes are closely 
related to the sequences which can be generated by shift 
registers with feedback connections. Recent discussions 
of these sequences have been given by Elspas [9] and by 
Zierler [42]. A review of the recent algebraic work on 
coding theory, including the Galois field theory which 
enters in the Bose-Chaudhuri codes, will be given by 
Peterson in a monograph to be published shortly [26]. 
Most of the results in this area extend to channels which 
have an input alphabet of symbols whose number is 
not 2, but any prime to any power, the channel still 
being completely symmetric in the way it makes its 
errors. Nonbinary channels have been investigated in 
their own right by Lee [20] and by Ulrich [88]. 

The introduction of two thresholds rather than one in 
a continuous channel introduces a null zone. The trans- 
mitter sends a binary signal, but the receiver makes a 
ternary decision, not attempting to guess the value of 
signals received in the null zone. Introducing the null 
zone may increase channel capacity, as shown by Bloom, 
et al. [30]. It also has the valuable effect of reducing the 
amount of computation required in decoding, since it is 
easier to replace missing digits than to correct incorrect 
ones. This is especially relevant for application to channels 
with Rayleigh fading. 


SEQUENTIAL DECODING 


Earlier work had shown that the block coding pro- 
cedure could be modified (in the binary case) by con- 
structing codes in a convolutional fashion, so that the 
coding and decoding of each digit was of the same 
character and involved the same delay [8]. The parameter 
which replaces block length in such an argument is the 
delay between the receipt of a digit and the attempt to 
decode it reliably. This simplified the coding, but left 
the decoding procedure as complicated as ever. However, 
Wozencraft [41] has shown that a suitable sequential- 
coding procedure may be followed by a sequential-de- 
coding procedure which reduces the average amount of 
decoding computation immensely. Like the best of the 
long block codes now in prospect, this procedure promises 
millisecond communication with microsecond switching 
circuitry in the decoder at very high reliability. Unlike 
the block codes, however, Wozencraft’s procedure is 


130 


statistical and not highly algebraic, and it may be ex- 
pected to generalize to other discrete channels with no 
special symmetry properties. On the other hand, the 
computation remains reasonable only for a range of R 
well below C. Epstein [10] has studied a sequential de- 
coding procedure for the erasure channel; work on more 
general channels is under way. 


CONCLUSIONS ON CODING 


The general conclusions of interest for applications of 
error-correcting codes are two. First, there are now several 
good, small codes which correct bursts of errors, and 
which could be instrumented fairly easily for use in 
situations in which a rate well below capacity can be 
tolerated, so that short codes may be used. These may 
find early application in sending digital data over tele- 
phone lines. Second, there are now available several kinds 
of large block codes and sequential codes which will 
permit very reliable transmission over long-distance 
scatter channels, which can also be implemented. The 
cost of implementation is appreciable in these cases, but 
current computer circuitry is fast enough to permit de- 
coding at transmission rates of the order of a thousand 
binary digits per second, coded in blocks or with sequential 
constraints hundreds of digits in length; the alternatives 
of more large antennas or greater transmitter power are 
also expensive. It seems likely that such systems will be 
in experimental use by the next international URSI 
meeting in 1963. 


OTHER Topics 


Less progress has been made in the economical coding 
of information sources. In part, this is because such pro- 
gress becomes work in speech analysis or, not of television 
systems and not information theory as such. However, 
it might be worth noting that a scheme for coding runs 
of constant intensity in television has been demonstrated 
at full television speed by Schreibex [29]. 

A relation between the bandwidth and the duration 
of a signal is imposed by the Heisenberg uncertainty 
principle, whose applicability to time functions was 
pointed out by Gabor many years ago. Kay and Silver- 
man [19] have examined this relationship more carefully, 
and a form of the uncertainty principle which places a 
lower bound on the sums of entropies, rather than on 
the products of second moments, is discussed by Leipnik 
[21]. Stam [37] also discusses this entropic inequality 
and closely related results. 

The sampling theorem is closely related to these ques- 
tions. Linden and Abramson [22] have given a generaliza- 
tion which permits ‘the closed form expression of a band- 
limited function in terms of samples of the function and 
of its first k derivatives, taken at time intervals (k + 1) 
times as far apart as is required for samples of the func- 
tion value alone. This extends earlier work by Jagerman 
and Fogel [18]. Results bearing both on the uncertainty 


IRE TRANSACTIONS ON INFORMATION THEORY 


principle and on approximate sampling theorems, 7.¢, 
theorems concerning functions which include all but a 
fraction 6, of their energy in bandwidth W and all but 
a fraction 6, of their energy in a time interval of duration 
T—are the subject of active current work. 


BIBLIOGRAPHY 


[1] N. M. Abramson, ‘‘A class of systematic codes for noninde 
pendent errors,” IRE Trans. on INForMATION THEORY, vol, 
IT—5, pp. 150-157; December, 1959. 
D. Blackwell, L. Breimann, and A. J. Thomasian, ‘‘Proof of 
Shannon’s transmission theorem for finite-state indecomposable 
nea. Ann. Math. Stat., vol. 29, pp. 1209-1220; December 
1958. : 
D. Blackwell, L. Breimann, and A. J. Thomasian, ‘“The capac 
ity of a class of channels,” Ann. Math. Stat., vol. 30, pp. 1229- 
1241; December, 1959. , 
[4] R. C. Bose, and D. K. Ray-Chaudhuri, ‘‘On a class of error- 
correcting binary group codes,’ Inform. and Control, vol. 3, 
pp. 68-79; March, 1960. 
5] R. C. Bose, and D. K. Ray-Chaudhuri, ‘Further results on 
error-correcting binary group codes,” Inform. and Control, vol. 
3, pp. 279-290; September, 1960. ; 
[6] L. Calabi, and H. G. Haefeli, “A class of binary systematic 
codes correcting errors occurring at random and in bursts,” 
IRE Trans. on InrormMation Tuuory, vol. IT-5, pp. 79-94 
May, 1959. 
[7] P. Elias, “Error-free coding,’ IRE Trans. on INFORMATION 
Tueory, vol. IT-4, pp. 29-37; September, 1954. 
[8] P. Elias, “Coding for noisy channels,’ 1955 IRE Natronau 
_ CONVENTION ReEcorD, pt. 4, pp. 37-46. 4 
[9] B. Elspas, ‘‘The theory of autonomous linear sequential net- 
works,’ IRE Trans. on INrorMaTion THEORY, vol. IT-5, 
_ pp. 45-60; May, 1959. 
| M. A. Epstein, ‘“‘Algebraic decoding for a binary erasure chan- 
Ne 1958 IRE Natronat ConveNtTION ReEcorp, pt. 4, pp. 
56-69. ; 
] A. Feinstein, “A new basic theorem in information theory,” 
IRE Trans. on Inrormation Tueory, vol. IT-4, pp. 2-223) 
_ September, 1954. 4 
2] A. Feinstein, ‘‘Foundations of Information Theory,’ McGraw- 
_ Hill Book Co., Inc., New York, N. Y.; 1958. 
| A. Feinstein, “On the coding theorem and its converse for 
finite mamory channels,’ Inform. and Control, vol. 2, pp. 25-44; 
April, 1959. | 
] J. H. Green, and R. L. San Soucie, ‘‘An error-correcting encoder 
and decoder of high efficiency,’’ Proc. IRE, vol. 46, pp. 1741— 
_ 1743; October, 1958. | 
| D. W. Hagelbarger, “Recurrent codes: Easily mechanized, 
burst-correcting, binary codes,” Bell Sys. Tech. J., vol. 38, 
_ pp. 969-984; July, 1959. j 
| R. W. Hamming, ‘“Error-detecting and error-correcting codes,” 
Bell Sys. Tech. J., vol. 29, pp. 147-160; April, 1950. 
D. A. Huffman, ‘A linear circuit viewpoint on error-correcting 
codes,’ IRE Trans. on InrorMAaTION THEORY, vol. IT-2, 
pp. 20-28; September, 1956. 
D. L. Jagerman and L. J. Fogel, ‘“SSome general aspects of the 
sampling theorem.”’ IRE Trans. on INFORMATION THEORY, 
vol. IT—2, pp. 139-146; December, 1956. 
I. Kay, and R. A. Silverman, ‘On the uncertainty relation 
for real signals,” Inform. and Control, vol. 2, pp. 396-397; 
December, 1959. ; 
C. Y. Lee, “Some properties of nonbinary error-correcting: 
codes,’ IRE Trans. on InrormMation Tuxory, vol. IT-4, 
pp. 77-81; September, 1958. 
R. Leipnik, ‘“The extended entropy uncertainty principle,’ 
Inform. and Control, vol. 3, pp. 18-25; March, 1960. 
D. A. Linden and N. M. Abramson, ‘‘A generalization of the 
sampling theorem,’”’ Inform. and Control, vol. 3, pp. 26-31; 
_ March, 1960. | 
|] K. EK. Perry, “An error-correcting encoder and decoder for 
phone line data,” 1958 IRE WESCON Convention Recorp,, 
_ pt. 4, pp. 21-26. 
[24] W. W. Peterson, and A. B. Fontaine, “(Group code equivalance 
and optimum codes,’ IRE Trans. on INroRMATION THEORY 
_ vol. IT—5, pp. 60-70; May, 1959. 
|] W. W. Peterson, “Encoding and error-correction procedures 
for the Bose-Chaudhuri codes,’’ IRE Trans. on INFORMATION 
Tueory, vol. IT-6, pp. 459-470; September, 1960. 


cat 


[2 


[3 


col 


F | 


W. W. Peterson, ‘Error-Detecting and Error-Correcting 

ee Technology Press, Cambridge, Mass., Res. Mono.; 

kK. Prange, “Coset Equivalence in the Analysis and Decoding 

_ of Group Codes,’’? AF Cambridge Res. Ctr., Cambridge, Mass., 

Tech. Note AFCRC-TR-59-164; June, 1959. 

| I. S. Reed, “A class of multiple-error-correcting codes,’’? IRE 
Trans. ON InrormMATION THEORY, vol. IT-4, pp. 38-49; 

| September, 1954. 

| W. F. Schreiber and C. F. Knapp, ‘TV bandwidth. reduction 
by digital coding,’ 1958 IRE Nationan Convention ReEcorp, 
pt. 4, pp. 88-99. 

| F. J. Bloom, et al., “Improvement of binary transmission by 
eee reception,” Proc. IRE, vol. 45, pp. 963-975; July, 

] C. E. Shannon, “The Mathematical Theory of Communica- 

tion,” University of Illinois Press, Urbana, IIl., 1949. 

C. E. Shannon, ‘‘Certain results in coding theory for noisy 

eanelsys Inform. and Control, vol. 1, pp. 6-25; September, 

1957. 

C. E. Shannon, ‘‘Channels with side-information at the trans- 

mitter,” IBM J., vol. 2, pp. 289-293; October, 1958. 

C. E. Shannon, ‘Probability of error for optimal codes in a 

Gaussian channel,’ Bell Sys. Tech. J., vol. 38, pp. 611-656; 

May, 1959. 

5] D. Slepian, “‘A class of binary signalling alphabets,’”’ Bell Sys. 
Tech. J., vol. 35, pp. 203-234; January, 1956. 

bp] D. Slepian, “A note on two binary signalling alphabets,” 

IRE Trans. on INForRmMAtTION THEory, vol. IT-2, pp. 84-86; 

June, 1956. 

y] A. J. Stam, “Some inequalities satisfied by the quantities of 

information of Fisher and Shannon,’ Inform. and Control, 

vol. 2, pp. 101-112; June, 1959. 

8] W. Ulrich, “‘Non-binary error-correcting codes,” Bell Sys. 

| Tech. J., vol. 36, pp. 1841-1388; November, 1957. 

D] J. Wolfowitz, ‘Information theory for mathematicians,” Ann. 

oy of Math. Stat., vol. 29, pp. 351-856; June, 1960. 


J. Wolfowitz, “Strong converse of the coding theorem,” 
__ Inform. and Control, vol. 3, pp. 89-93; March, 1960. 
q J. M. Wozencraft, “Sequential Decoding for Reliable Com- 


| munication,’ Res. Lab. of Electronics, Mass. Inst. Tech., 


Cambridge, Tech. Rept. 325, August, 1957; in revised form in 
Technology Press, Cambridge, Mass., Res. Mono.; 1960. 

2] N. Zierler, ‘Linear recurring sequences,” J. Soc. Indust. Appl. 
_ Math., vol. 7, pp. 31-48; March, 1959. 


PART 2: RANDOM PROCESSES 
| P. SWERLING}, mMempBEr, IRE 


under consideration may be conveniently summa- 

rized under three main headings: statistical proper- 
es of the output of nonlinear devices, estimation theory 
yr random processes, and representation theory for 
undom processes. 
Under the first heading, the investigations concern 
1e statistical properties of the output of a nonlinear 
evice, or of a linear filter following a nonlinear device, 
hen the input is a random process having prescribed 
fatistics. These problems are of great interest since this 
-a model for many types of receivers. The period 1957— 
960, continuing earlier work, has seen the build up of a 
ge inventory of results and of methods for attacking 
us class of problems. 
One of the most comprehensive approaches is reported 
on in papers by Darling and Siegert, and by Siegert 
}|+{3]. These papers, published in 1957 and 1958, report 


oe on random processes in the period 


{+ Rand Corp., Santa Monica, Calif. 


pi Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 


131 


on work actually done earlier. The problem considered 
is that of finding the (first-order) probability distribution 
function of the quantity 


| ele(2), sar, 


where ¢ is a prescribed function, and x(7) 1s a component 
of a stationary n-dimensional Markoff process. Many 
problems in the category under consideration are special 
cases of this. The approach is via the characteristic func- 
tion of the required probability distribution; it is shown 
that this characteristic function must satisfy two integral 
equations. Under certain conditions, it can also be shown 
that the characteristic function must satisfy two partial 
differential equations. 

Another type of problem in this category is the in- 
vestigation of the second- or higher-order probability 
distributions of the output, and particularly of the auto- 
correlation function of the output or the cross correlation 
between two or more such outputs. For example, Price 
in [4] gives a theorem which is useful in deriving such 
auto- and cross-correlations when the inputs are Gaussian. 
The theorem stated can be used in many cases to calculate 
the quantity 


R = expected value of en je}, 

t=1 
where (2, --- , 2%) is a Gaussian vector and f; are pre- 
scribed functions. 

Many other papers, for example [5|-{11], have been 
written giving special results and using a number of 
different approaches. 

Work has also continued on the problem of the distribu- 
tion of zero crossings of Gaussian processes [12], [13]. 

Under the heading of estimation theory for random 
processes one might first mention the subject of estimat- 
ing the spectral density of stationary Gaussian processes. 
Two references, [14] and [15], published in the period 
1957-1960, summarize much work on this problem, a 
great deal of which had been done previously (but not 
all of which had been published previously). Blackman 
and Tukey discuss two types of estimates of the power 
spectrum, viz., estimation of the autocorrelation func- 
tion, multiplication by a prescribed function of time 
called a “lag window,” followed by Fourier transforma- 
tion; or passing the observed process through a filter of 
specified transfer function and calculating the average 
power of the output. They derive expressions for the 
first and second moments of such estimates, as well as for 
the cross moments of estimates of the spectral density 
at two different frequencies. Grenander and Rosenblatt 
discuss similar types of spectral estimates, emphasizing 
and utilizing the fact that these, as well as most other 
useful estimates of spectral density, are quadratic forms 
in the observed data. They derive first- and second-order 
moments, as well as asymptotic probability distribu- 


132 


tions for large observed samples, of such estimates. 

A recent paper of Grenander, Pollak, and Slepian [16] 
discusses the small sample case, relying heavily on the 
fact that spectral density estimates are usually quadratic 
forms in the observed data. 

In an interesting paper [17] Slepian has discussed the 
following hypothesis-testing problem: given an observed 
sample of a Gaussian random process, known to be 
characterized by either one of two prescribed power 
spectra, which power spectrum does the process actually 
have? It turns out that in problems of this type, the 
measures induced by the two alternative hypotheses may 
be singular with respect to each other; in which case, 
it is possible to decide between the alternatives with 
arbitrarily small error probability, and with an arbitrarily 
small sample. Slepian gives various sufficient conditions 
for this. The power spectra satisfying his conditions are, 
moreover, standard types very frequently postulated. 
This emphasizes that the mathematical model one chooses 
must be carefully chosen to be appropriate to the problem 
one is trying to solve. 

Another type of estimation problem for random proces- 
ses 1s considered by Swerling [18]. Suppose a prescribed 
waveform, depending on one or more unknown param- 
eters, 1s observed in additive Gaussian noise having 
prescribed autocovariance function and zero mean. Ex- 
pressions are derived for the greatest lower bound for 
the variance of estimates of the unknown parameters 
having prescribed bias. These greatest lower bounds are 
found to coincide in certain special cases with the variance, 
obtained by Woodward, of maximum likelihood estimates 
of the unknown parameters. Similar problems are in- 
vestigated by Middleton [19]. 

In the field of representation theory for random 
processes, work has continued on the subject of repre- 
sentation of nonlinear operations on random processes— 
especially for Gaussian processes. Papers by Zadeh [20] 
and Bose [21], and a book by Wiener [22] deal with this 
problem. The approach followed is, first, to express the 
initial random process {2(t)} as a series 


A 0S ERA OS 


n=) 


where {a,(t)} is a set of orthonormal functions over the 
interval of definition of {a(t)}. If {a(t)} is Gaussian, the 
u, are Gaussian and, if a,,(¢) are properly chosen, can be 
made independent. Any linear or nonlinear functional 
of {x(t)} can then be regarded as a function of 
U1, °** 5 Uny *** . Second, one may choose a set of func- 
tions of the variables wu, which are orthonormal in the 
stochastic sense (as explained, for example, in Zadeh [20] 
with respect to the process {x(t)}. Then, nonlinear func- 
tionals of {a(t)} may be expanded in a series of the ortho- 
gonal functions of the variables w,. 

Other research in the field of representation theory 


IRE TRANSACTIONS ON INFORMATION THEORY 


has treated such subjects as: Use of bi-orthonormal ex. 
pansions [11], envelopes of waveforms [23], [24], th 
sampling theorem and related topics [25], [26], 
monic analysis of multidimensional processes [27]. Mue 
of this work in representation theory provides useful 
tools for attacking the problems discussed under th 
first two headings above. 


BIBLIOGRAPHY 


[1] D. A. Darling, and A. J. F. Siegert, “A systematic approach 
to a class of problems in the theory of noise and other rando 

phenomena-part I,’”’ IRE Trans. oN INFORMATION THEORY, 

vol. IT-3, pp. 32-37; March, 1957. 

[2] A. J. F. Siegert, ‘“A systematic approach to a class of problems 

in the theory of noise and other random phenomena—part II, 

examples,” [RE Trans. on Inrormation Tuxory, vol. IT-3, 

pp. 38-43; March, 1957. 

[3] A. J. F. Siegert, ‘“A systematic approach to a class of problems 

in the theory of noise and other random phenomena—part III, 

examples,’ IRE Trans. on INrorMaTIOoN THEORY, vol. IT—4, 

pp. 4-14; March, 1958. | 

[4] R. Price, “A useful theorem for nonlinear devices having 
Gaussian inputs,’ IRE Trans. on INFORMATION THEORY, 
vol. IT—4, pp. 69-72; June, 1958. . | 

[5] Leipnik, Roy, ‘The effect of instantaneous nonlinear devices 
on cross correlation,’ IRE Trans. oN INFORMATION THEORY, 
vol. IT—4, pp. 73-76; June, 1958. 

[6] J. N. Pierce, ‘‘A Markoff envelope process,’’? IRE Trans. on 
InrorMaTion THrEorY, vol. IT—4, pp. 163-166; December, 
1958. 

[7] J. Kielson, N. D. Mermin, and P. Bello, ‘‘A theorem on cross 
correlation between noisy channels,’ IRE Trans. on INFOR- 
MATION THEORY, vol. IT—5, pp. 77-79; June, 1959. : 

[8] C. W. Helstrom, and C. T. Isley, “Two notes on a Markoff 
envelope process,” IRE Trans. oN INFORMATION THEORY, 
vol. IT-5, pp. 139-140 (Correspondence); September, 1959. 

[9] J. A. McFadden, ‘‘The probability density of the output of an 
RC filter when the input is a binary random process,’’? IRE 
TRANS. ON INFORMATION THEORY, vol. IT-5, pp. 174-178; 
December, 1959. 

10} L. L. Campbell, ““On the use of Hermite expansions in noise 
problems,” SIAM J., vol. 5, pp. 244-249; December, 1957. 

11] R. Leipnik, “Integral equations, biorthonormal expansions, 
and noise,” SIAM J., vol. 7, pp. 6-30; March, 1959. 

[12] C. W. Helstrom, ‘‘The distribution of the number of crossings 

of a Gaussian stochastic process,’’ IRE Trans. on INFORMATION 

Tueory, vol. IT-3, pp. 282-237; December, 1957. 

W. M. Brown, ‘‘Some results on noise through circuits,’’ IRE 

Trans. ON InFoRMATION THEORY, vol. IT-5, pp. 217-227; 

May, 1959. 

[14] U. Grenander, and M. Rosenblatt, “Statistical Analysis of 

Stationary Time series,’ John Wiley and Sons, Inc., New 

Vorks NpYer 19576 

R. B. Blackman, and J. W. Tukey, ‘‘The Measurement of 

Power Spectra from the Point of View of Communications 

Engineering,’ Dover Publications, Inc., New York, N. Y.; 

1959. 


U. Grenander, H. O. Pollak, and D. Slepian, ‘The distribution 
of quadratic forms in normal variates: A small sample theory 
with applications to spectral analysis,’ SIAM J., vol. 7, pp 
374-401; December, 1959. 

|] D. Slepian, “Some comments on the detection of Gaussian 
signals in Gaussian noise,’ IRE Trans. oN INFORMATION 
TuHeory, vol. IT—4, pp. 65-68; June, 1958. 

P. Swerling, “Parameter estimation for waveforms in additive 
Gaussian noise,’ STAM, J., vol. 7, pp. 152-166; June, 1959. 
| D. Middleton, ‘‘A note on the estimation of signal waveform,’ 
IRE Trans. on INrormation THEory, vol. IT-5, pp. 86-89 
June, 1959. 

] L. A. Zadeh, “‘On the representation of nonlinear operators,’ 
1957 IRE WESCON Convention Recorp, pt. 2, pp. 105-113 
A. G. Bose, ‘‘Nonlinear system characterization and optimiza 
tion,’ IRE Trans. on INForMATION THEORY, vol. IT-5 
pp. 30-40; May, 1959. 

] N. Weiner, “Nonlinear Problems in Random Theory,” Johi 
Wiley and Sons, Inc., New York, N. Y.; 1958. 

| R. Arens, ‘“Complex processes for envelopes of normal noise,’ 
IRE Trans. on INForMATION THEORY, vol. IT-3, pp. 204-207 
September, 1957. 


J. Dugundji, “Envelopes and pre-envelopes of real waveforms,” 
| IRE Trans. on Inrormation TueEory, vol. IT-4, pp. 53-57; 
March, 1958. 

A. Y. Balakrishnan, ‘‘A note on the sampling principle for 
continuous signals,’ IRE Trans. on INrorMATION THEORY, 
vol. IT-3, pp. 143-146; June, 1957. 

R. M. Lerner, ‘‘The representation of signals,’ IRE Trans. 
ON InrorMATION THEORY, vol. IT—5, pp. 197-216; May, 1959. 
N. Weiner and P. Masani, ‘The prediction theory of multi- 
ee processes,’ Acta Math., vol. 98, 1957; vol. 


PART 3: PATTERN RECOGNITION 
ARTHUR GILL}, MemBer, IRE 


INTRODUCTION 


ATERN recognition, in its widest sense, cuts 
across many fields of engineering interest—from 
signal detection to learning theory, and from me- 
anical translation to decision-making techniques. Inas- 
ich as the problem of recognizing patterns is that of 
nulating human thinking processes, it is also closely 
ted to nonengineering fields such as_ physiology, 
TE icey. linguistics and cryptology. No attempt is 
ide in this report to summarize the developments in all 
sse areas. Rather, pattern recognition developments 
» reported only to the extent that they represent a 
itribution to the theory of information. The papers 


erred to below are primarily those published in Ameri- 


a engineering journals from 1957 to date; consequently, 
will be found that the emphasis in this report is placed 
the recognition of visual patterns, rather than vocal, 


f 


ia or other patterns, which are mainly covered in 


b 


nengineering publications. 
The reason for the keen engineering interest in the 
sognition of visual patterns is the recent emergence of 
> following two urgent problems: a) How can redundan- 
-s be removed from television pictures, so that video 
nals could be transmitted at a greatly reduced wave- 
nd. b) How can printed documents be read automati- 
lly, so that the most serious bottleneck—the human 
pist or card puncher—could be eliminated from digital 
ta-processing systems. Although these two topics are 
sated separately in the literature, both represent the 
me general problem of pattern recognition. In the 
lowing review, this problem will.be divided, rather 
ificially, into the following three phases: 1) redundancy 
idies, 2) recognition procedures, and 3) learning systems. 
REDUNDANCY STUDIES 
Both the compression of television bandwidth and the 
sign of character recognizers require the determination 
the source redundancies. The knowledge of these 
Jundancies results in effective recognition criteria, as 


+ Dept. of Elec. Engrg., University of California, Berkeley, 
lif. 


1 Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 


133 


well as economies in the contemplated recognition system. 
Powers and Staras [33] suggest to separate picture re- 
dundancy into nonpredictive redundancy, resulting from 
nonoptimal first-order probability distribution, and pre- 
dictive redundancy, resulting from statistical correlation 
between successive signals. Experimental work shows 
that nonpredictive redundancy in television pictures is 
essentially zero; predictive redundancy permits at least 
two-to-one saving in bandwidth requirements. Two-to- 
one saving is also concluded by Deutsch [6], in the case 
of typewritten or printed alphabetic characters. Ko- 
vasznay and Arman [25] propose a new practical method 
for measuring the autocorrelation function of two-di- 
mensional random patterns; with the aid of this method, 
the entire function can be obtained at once in the form 
of a light distribution on a plane. 

On a more theoretical level, Gill [13] produces bounds 
to the number of nonredundant cells in noiseless and 
noisy patterns, and presents an algorithm for locating 
these cells. Stearns [87] proposes a method for removing 
redundancies from given patterns, which is basically a 
method for reducing Boolean equations containing a 
large number of ‘‘don’t care’’ terms. 

A recognition system designed to serve human beings 
must take into account not only the source characteristics, 
but also the characteristics of the human “‘load.”’ Schreiber 
and Knapp [35] exploit both picture redundancies and 
human vision limitations to code video signals and to 
transmit the code at a uniform rate. Graham [16] de- 
scribes a series of subjective experiments whose purpose 1s 
to evaluate the range of transmitted pictures satisfactorily 
interpreted by human observers. 

Several results have been obtained through which the 
efficiency of automatic pattern recognizers can be com- 
pared with the efficiency of human recognizers. Pierce 
and Karlin [32] report that human beings can transmit 
printed information, by reading, at the rate of up to 50 
bits per second. The accuracy of human recognition of 
hand-printed characters is found by Neisser and Weene 
[31] to be less than 97 per cent. Singer [36] concludes that 
the human recognition process is not limited by the 
visual channel, whose capacity is 10'° bits per second, 
but by the brain. 

Michel [28] shows how the statistical characteristics of 
the pattern source can be helpful in devising efficient 
coding schemes for picture transmission. A particular 
scheme, known as ‘‘variable-length coding,” is described 
by Michel, Fleckenstein and Kretzmer [29]; in this scheme, 
the transmission rate may be made proportional to the 
source complexity, to result in considerable saving in 
bandwidth requirements. Capon [4] computes the theoreti- 
cal bounds on this saving, considering patterns as first- 
order Markoff processes. Heasly [22] shows how a char- 
acter-sensing communication channel can be matched 
to the source to yield the maximum over-all information 
flow. 


IRE TRANSACTIONS ON 
RECOGNITION PROCEDURES 


Basically, the process of pattern recognition is that of 
sequentially sorting a large number of elements into a 
relatively small number of classes, according to a pre- 
determined set of characteristics. Burge [8] tackles the 
sorting problem from a general point of view, devising 
optimal sorting strategies by minimizing appropriate 
“sorting trees.” He concludes that the best strategy 
depends on the amount of order already existing in the 
data, where ‘‘order’’ is defined as the minimum amount 
of work required to sort the data into complete order. 
Hartmanis [21] develops an algebra of partitions, to 
facilitate the decomposition of a complex sequential 
process into several simpler ones; he also formulates the 
necessary and sufficient conditions for such decomposition. 
McLachlan [27] proposes a special mathematical disci- 
pline, called ‘description mechanics,” for the characteriza- 
tion of general recognition processes. In this proposal, 
a pattern is a special case of a “description domain,” 
divisible into cells whose size is determined by the pre- 
scribed resolution; pattern classes are special cases of 
‘occupant classes,’’ whose number is determined by the 
prescribed recognition accuracy. 

The sorting procedure itself is carried out by searching 
for various properties in the unknown pattern, and com- 
paring them with the properties of a ‘‘reference set” of 
patterns. Unger [389] describes a system, employing a 
space-oriented computer, capable of detecting a_ pre- 
determined set of geometrical properties; association of 
each reference pattern with a subset of these properties 
yields the recognition of the unknown pattern. Similar 
systems are proposed by Bomba [2], who extracts the 
geometrical properties by operations on small sections 
of the unknown pattern, by Greenias, Hoppel, Kloomok 
and Osborne [19], who recognize patterns by the relative 
size and position of the pattern elements, by Kamentsky 
[23], who extracts the geometrical properties with neuron- 
like sensing elements, and by Dimond [8], who processes 
handwritten characters by subjecting them to special 
coordinate constraints. Tersoff [38] describes a device 
which facilitates the property-extraction operations by 
minimizing the effects of pattern tilt and extraneous 
marks. Kirsch, Cahn, Ray, and Urban [24] describe 
laboratory apparatus intended for finding suitable sets of 
properties for given patterns. The problem of designing 
logical circuits to carry out the recognition procedure was 
treated by Evey [10], who proposed various schemes for 
optimizing this logic. 

Glantz [14] formulates a general recognition procedure, 
employing an “operator” which specifies the method of 
comparison between the unknown and reference patterns, 
and a‘“‘threshold” which must be overcome for satisfactory 
recognition. Some of these ideas are carried out by Gold 
[15] who applies a set of fixed “language rules” and 
statistically determined threshold tests to recognize hand- 
sent Morse code. 


INFORMATION THEORY Ju 


One criterion for satisfactory recognition, which arouse 
considerable interest, is the minimum average cost (th 
Bayes risk) criterion, proposed by Chow [5]. In Chow’ 
recognition system, the patterns signal, the noise st 
tistics, and the cost of misrecognition are known 
advance; on the basis of this knowledge the condition 
probability of the unknown noisy pattern is compute 
and weighted with respect to every possible noiseles 
pattern, and identification is made as to minimize th 
expected cost. The choice of ‘noiseless’ patterns to b 
used as reference in this system is discussed by Flore 
and Grey [11]; they give criteria for optimizing these 
patterns in the case of white Gaussian noise, and prog 
that the best pattern coding to be used under such con: 
ditions is not necessarily binary. Dickinson [7] describes 
the application of Chow’s system to slit-scan recognition d 
low-noise and small-size pattern sets. The design oj 
synthetic pattern sets for reference purposes is discussec 
by Flores and Ragonese [12], who give formulas based or 
the geometry of the patterns and the empirical proper 
ties of the sensing apparatus. Greenias and Hill [18 
define measures of character quality and style to aid ir 
the design of synthetic characters. | 


LEARNING SYSTEM 


As indicated in the previous section, the prerequisite 
for the design of an efficient pattern recognizer is the 
determination of a set of invariants, in terms of whick 
the patterns can be uniquely defined. While many in: 
vestigators select this set on the basis of intuition anc 
personal experience, others prefer to let a computer make 


the selection through some “learning” process. Doyle [9 


describes a system which collects statistical data or 
known. patterns in order to formulate a series of test 
to be used later on unknown patterns. The pattern ree 
ognizer proposed by Bledsoe and Brown [1] “‘learns’’ the 
patterns by marking the states of cell pairs randomly 
distributed over the pattern area. A general learning 
system, the perceptron, is described by Rosenblatt [34 
and Murray [30]; this system, comprising logically simpli 
fied neural elements, learns how to discriminate anc 
identify perceptual patterns, after undergoing a specia 
“training” program. Mattson [26] describes a logica 
system which can adjust itself to realize various patter 
processing requirements. Uttley [40] proposes an ‘‘in 
ductive inference” machine which can imitate trial-and 
error learning by computing conditional probabilities o 
known patterns. 

A different point of view is adopted by Greene [17] 
whose system memorizes ‘‘perceptual units” (Gestalten) 
such as a circle, a triangle, or a square, in order to identift 
more complex patterns; the perceptual ability of thi 
system is enhanced by making it obey certain equations 0 
quantum mechanics. Harmon [20] describes a simila 
system, where the perceptual units are recognized b: 
means of a circular scan. 


‘he learning systems mentioned above, chosen for 
ir immediate applicability to pattern recognition, are 
esentative of a much larger number of ‘artificial 
lligence” systems, the discussion of which is beyond 
scope of this report. 


BIBLIOGRAPHY 


W. W. Bledsoe, and I. Browning, “Pattern recognition and 
reading by machine,” Proc. Eastern Joint Computer Conf., 
Boston, Mass., December 1-3, 1959; pp. 225-232. 

J. S. Bomba, ‘‘Alpha-numerie character recognition using local 

operations,’ Proc. Eastern Joint Computer Conf., Boston, 

Mass., December 1-3, 1959; pp. 218-224. 

W. H. Burge, “Sorting, trees and measures of order,” Inform. 
and Control, vol. 1, pp. 181-197; September, 1958. 

J. Capon, ‘‘A probabilistic model for run-length coding of 
| pictures,’ IRE Trans. on Inrormation Tueory, vol. IT-5, 
pp. 157-163; December, 1959. 

C. K. Chow, “‘An optimum character recognition system using 

decision functions,” IRE Trans. on Evectrronic Computers, 

vol. EC-6, pp. 247-254; December, 1957. 

S. Deutsch, “A note on some statistics concerning typewritten 

or printed material,’ IRE Trans. on INroRMATION THEORY, 

vol. IT-3, pp. 147-148; June, 1957. 

W. E. Dickinson, “A character-recognition study,” IBM J. 

Res. and Dev., vol. 4, pp. 335-348; July, 1960. 

_T. L. Dimond, “Devices for reading handwritten characters,” 

Proc. Eastern Joint Computer Conf., Washington, D. C., 
December 9-13, 1957; pp. 232-237. 

W. Doyle, ‘Recognition of sloppy, hand-printed characters,” 

Proc. Western Joint Computer Conf., San Francisco, Calif., 

May 3-5, 1960; pp. 133-142. 
| R. J. Evey, ‘Use of a computer to design character recognition 

logic, Proc. Eastern Joint Computer Conf., Boston, Mass., 

December 1-3, 1959, pp. 205-211. 

I. Flores and L. Grey, ‘‘Optimization of reference signals for 

_ character recognition systems,’ IRE Trans. on ELEcTRONIC 

Computers, vol. EC-9, pp. 54-61; March, 1960. 

_ I. Flores and F. Ragonese, “A method for synthesizing the 

_ wave form generated by a character, printed in magnetic ink, 

in passing beneath a magnetic reading head,” IRE Trans. on 

| eee Computers, vol. EC-7, pp. 277-282; December, 
| A. Gill, ““Minimum-scan pattern recognition,” IRE Trans. on 

INFORMATION THEORY, vol. IT—5, pp. 52-58; June, 1959. 

H. T. Glantz, “On the recognition of information with a 
| digital computer,” J. ACM, vol. 4, pp. 178-189; April, 1957. 
B. Gold, “Machine recognition of hand-sent Morse code,”’ 
| IRE Trans. on Inrormation TuHuory, vol. IT-5, pp. 17-24; 

March, 1959. 

] R. E. Graham, “Subjective experiments in visual communica- 
tion,” 1958 IRE Natrona CoNnvENTION ReEcorD, pt. 4, 
pp. 100-106. 

] P. H. Greene, “Networks for pattern perception,” Proc. Natl. 

_ Electronics Conf., vol. 15, pp. 357-3869; October, 1959. 

| E. C. Greenias and Y. M. Hill, “Considerations in the design 
of character recognition devices,’ 1957 IRE Nationau Con- 
VENTION RECORD, pt. 4, pp. 119-126. 

| E. C. Greenias, et al., “Design of logic for recognition of printed 
characters by simulation,’ JBM J. Res. and Dev., vol. 1, pp. 
_ 8-18; January, 1957. 
| L. D. Harmon, “A line-drawing pattern recognizer,’ Proc. 
Western Joint Computer Conf., San Francisco, Calif., May 
3-5, 1960; pp. 351-364. 

J. Hartmanis, “Symbolic analysis of a decomposition of in- 

formation processing machines,” Inform. and Control, vol. 3, 

pp. 154-178; June, 1960. 

|] C. C. Heasly, ‘Some communication aspects of character- 
sensing systems,’’ Proc. Western Joint Computer Conf., San 
Francisco, Calif., May 3-5, 1959; pp. 176-180. 

] L. A. Kamentsky, ‘‘Pattern and character recognition systems— 

picture processing by nets of neuron-like elements,’ Proc. 

Western Joint Computer Conf., May 3-5, 1959, San Francisco, 
Calif.; pp. 304-309. 

R. A. Kirsch, ef al., ‘Experiments in processing pictorial 

information with a digital computer,’ Proc. Eastern Joint 

Computer Conf., December 9-13, 1957, Washington, D. C.; 

pp. 221-229. 

L. S. G. Kovasznay and A. Arman, “Optical autocorrelation 

measurement of two-dimensional random patterns,” Rev. Sct. 

Instr., vol. 28, pp. 793-797; Month, 1957. 


a 


— 


— 


1 Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 135 


[26] R. L. Mattson, ‘‘A self-organizing binary system, Proc. Hastern 
Joint Computer Conf., Boston, Mass., December 1-3, 1959; 
pp. 212-217. 

[27] D. McLachlan, “Description mechanics,” Inform. and Control, 
vol. 1, pp. 240-266; September, 1958. 

[28] W. 8S. Michel, ‘Statistical encoding for text and picture com- 
munication,’ Commun. and Electronics, vol. 77, pp. 33-36; 
March, 1958. 

[29] W. S. Michel, W. O. Fleckenstein, and E. R. Kretzner, “A 
coded facsimile system,’ 1957 IRE WESCON ConvEenTION 
Recorp, pt. 2; pp. 84-93. 

[30] A. E. Murray, “A review of the perceptron program,’’ Proc. 
Natl. Electronics Conf., vol. 15, pp. 346-3856; October, 1959. 

[31] V. Neisser and P. Weene, ‘‘A note on human recognition of 
hand-printed characters,’ Inform. and Control, vol. 3, pp. 
191-196; June, 1960. 

[32] J. R. Pierce and J. E. Karlin, ‘‘Reading rates and the informa- 
tion rate of a human channel,” Bell Sys. Tech. J., vol. 36, pp. 

f 497-516; March, 1957. 

[33] Ik. H. Powers and H. Staras, ‘‘Some relations between tele- 

vision picture redundancy and bandwidth requirements,” 

Commun. and Electronics, vol. 76, pp. 492-496; September, 1957. 

[84] F. Rosenblatt, “Perception simulation experiments,’ Proc. 

IRE, vol. 48, pp. 301-309; March, 1960. 

[35] W. F. Schreiber and C. F. Knapp, “TV bandwidth reduction by 

digital coding,’’ 1958 IRE Natronan Convention ReEcorD, 

pt. 4; pp. 88-89. 

[36] J. R. Singer, “Information theory and the human visual 

system,”’ J. Opt. Soc. Am., vol. 49, pp. 639-640; June, 1959. 

[37] S. D. Stearns, ““A method for the design of pattern recognition 

logic,’ IRE Trans. on Evectronic Computers, vol. EC-9, 

pp. 48-53; March, 1960. 

[38] A. I. Tersoff, “Automatic registration in high-speed character- 

sensing equipment,’’ Proc. Eastern Joint Computer Conf., 

Washington, D. C., December 9-13, 1957; pp. 232-237. 

[39] S. H. Unger, ‘“‘Pattern detection and recognition,’ Proc. IRE, 

vol. 47, pp. 1737-1752; October, 1959. 

[40] A. M. Uttley, “Imitation of pattern recognition and trial-and- 
error learning in a conditional probability computer,’ Revs. 
Mod. Phys., vol. 31, pp. 546-548; April, 1959. 


PART 4: DETECTION THEORY 


ROBERT PRICE}, SENIOR MEMBER, IRE, AND 
NORMAN ABRAMSONT, MEMBER, IRE 


INTRODUCTION 


HE PERIOD since the XII General Assembly has 
albes a consolidation of the closely related concepts 

of Wald, Woodward, Middleton and Van Meter, 
and Peterson, Birdsall and Fox into a fairly unified theory 
of detection, together with the successful application of 
the theory to a variety of problems. Through this ap- 
proach, ‘“‘optimal” detector structures for electronic sys- 
tems can be synthesized, provided that the designer has 
a priort knowledge of the governing statistics and error 
costs. At the same time, older and more standard detec- 
tion techniques have continued to receive attention, the 
theoretical results generally being stated in terms of 
probability-of-error or SNR at the detector output. 

The maturing of the field of detection theory in the 
past three years is evidenced by the fact that during 
this period four books dealing with detection theory 
(to some extent) were published. The first of these was 
“Random Signals and Noise,’ by Davenport and Root [1], 


+ Lincoln Lab., Mass. Inst. Tech., Lexington, Mass. 
t Dept. of Elec. Engrg., Stanford University, Stanford, Calif. 


136 


in which several detection problems of a simple nature 
were discussed in the last chapter. ‘‘ Principles and Appli- 
cations of Random Noise Theory,’ by Bendat [2] also 
discussed the detection problem with particular emphasis 
being given to the errors in various autocorrelation 
measurements. ‘‘Introduction to Statistical Communica- 
tion Theory,’ by Middleton [3] is another of these four 
books dealing with the theory of signal detection. In 
this comprehensive book, a wealth of specific detection 
problems are treated and the performance characteristics 
of many optimum and suboptimum detection systems 
are calculated. ‘“‘Statistical Theory of Signal Detection,” 
by Helstrom [4] is a book devoted to the detection problem 
alone, although Helstrom’s definition of detection is 
broad enough to include the closely related subjects of 
signal resolution and estimation of signal parameters. 

It appears that roughly half the effort of the past 
three years has been devoted to specific detection prob- 
lems in radar and communications. In contemporary 
communications studies, considerable heed is paid to 
“optimum” detection procedures, there being less incli- 
nation to examine conventional, suboptimum detectors 
than in the radar analyses. The reason for this may be 
that the radar designer faces considerably greater a 
priort uncertainty, both with regard to the signal and the 
channel through which it comes. By contrast, relatively 
simpler channels may usually be assumed without loss 
of realism in communications problems, and the com- 
munications system designer has more direct control of 
the signal. The appropriate optimum detectors for com- 
munications then turn out to be rather elementary, and 
can at present be constructed with hardly more effort 
than suboptimum devices require. In fact, the communi- 
cations environment is generally ‘“‘clean’’ enough so that 
much recent work has been concerned with determining 
good sets of transmitted signal waveforms, the use of an 
optimum receiver being taken for granted. 


COMMUNICATIONS 


Some problems of a practical nature associated with the 
selection of good sets of signals for various types of 
digital phase-modulation systems are discussed by Cahn 
[5], Lawton [6], [7] and Hopner [8]. A more general ap- 
proach to the problem of the selection of signals and the 
shaping of pulses was given by Sunde [9]. In his paper 
comparing AM and FM methods of pulse transmission, 
he concludes that FM has an advantage over AM for 
the case of a fixed bandwidth channel perturbed by 
additive white noise. 

Reiger [10] has looked at problems of the selection 
of a set of signals and the use of error-correcting codes. 
Some simple results seem to indicate that, for small 
block lengths, if the equipment complexity caused by a 
large number of signals can be tolerated, a greater im- 
provement may be obtained by use of these waveforms 
than by use of error-correcting codes. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Juli 


One example of a communication channel which 
not ‘‘clean’’ is the channel with fading. One of the earliest 
studies concerned with the fading channel was done bj 
Masonson [11]. Masonson analyzes the transmission 0} 
binary messages through noise and fading with severa 
types of systems. An analysis of slowly-fading, frequency 
nonselective channels perturbed by white Gaussian nois¢ 
was performed by Turin [12]. Turin obtained genera 
expressions for the error probabilities of such a channe 
in both the coherent and noncoherent cases, and_ he 
applied the results to FSK systems with a variety 0} 
pulse shapes. Turin [13] has also examined the selectively 
fading communication channel and has found that even 
if one of two independently fading paths is relatively 
weak, the error probability is considerably lower than if 
only the stronger path is present. Some particularly im- 
portant results in the “optimum” (7.e., a@ posteriorz prob- 
ability computer) detection of signals perturbed by ¢ 
‘Gaussian’? random channel are given in a paper by 
Kailath [14]. Kailath shows that the concept of detection 
by a matched filter (optimum for the case of a known 
channel) can be generalized in the case of the randomly 
perturbed channel. The optimum receiver for the ran- 
domly perturbed channel is still a matched filter where, 
however, the “matching” is with respect to a subsidiary 
estimate of the output of the random channel. : 

Another method of handling communication problems 
caused by fading channels is that of diversity reception. 
Pierce [15] has analyzed the improvement available 
through diversity reception, and has obtained expressions 
for the probability of error for both square-law combi- 
ning and “switch” diversity. | 


] 


RADAR 


During the last three years, there have been several 
analyses of the detection performance of specific types 
of radar detectors. Cohn and Peach [16] have described 
equipment for the direct measurement of waveform 
probabilities. Dilworth and Ackerlind [17] have used 
Monto Carlo techniques in order to measure output 
probability distributions of filter-linear detector-inte- 
grator and filter-squarer-integrator combinations. Buss: 
gang, Nesbeda and Safran [18] have provided a simplified 
analysis of sweep-integrator systems containing square- 
law detectors. Green [19] has analyzed the logarithmic 
detector and found that it is about 1 db worse than 2 
square-law detector. Stone, Brock and Hammerle [20 
have found the probability densities of the output of 4 
filter-squarer-filter detector with constant and _ witk 
Rayleigh-fading input signals. 

Miller and Bernstein [21] have performed an analysis 
of the first-order effects of interchannel correlation in 2 
bank of filters covering a region of Doppler uncertainty 
Their results indicate that, for idealized coherent inte. 
grators, the more filters, the better the system performance 
will be. Some more quantitative results on the effect ot 
interchannel correlations have been obtained by Galejs 


Cowan [22]. They have been able to calculate cor- 
tions to false alarm and incorrect dismissal probabilities 
to noise correlation in contiguous channels. 

general analysis of the radar detection problem, 
timum detector synthesis, and the evaluation of the 
formance of these detectors using orthogonal expansion 
prdmates, has been given in two reports by Reed, 
lly and Root [23]. Max [24] has investigated the 
Ssibilities of mismatched filters to combat clutter. He 
5 been able to obtain integral equations whose solutions 
tld improved performance against clutter. 

he problem of detection of random signals in a vari- 
e strength noise environment is one in which we can 
pect to see a good deal of work in the future. One study 
wling with this problem has already been completed 
Siebert [25]. Siebert discusses a constant false alarm- 
ce detector for use when the noise is of variable strength. 


DETECTION OF STOCHASTIC SIGNALS 


In the previous two sections devoted to communica- 
ins and to radar, we have had several occasions to refer 
work being done on the detection of stochastic signals 
noise. In this section, we shall mention several other 
adies in detection of stochastic signals which are not 
ussified as primarily communications, or primarily radar 
idies. Strum [26] has discussed the use of microwave 
Hiometry for detection with special emphasis to its use 
radio astronomy. In an appendix, he has shown that 
uare-law detection is slightly superior to linear (either 
erage or peak-envelope) detection for low SNR. Kelly, 
"ons and Root [27] have given a more general demon- 
‘ation that the square-law is optimum. 

Middleton [28], [29] and Kailath [30] have investigated 
e detection of stochastic signals in noise and have 
rived at a form of optimum detector which may be 
nthesized as a time-varying linear filter. Under some- 
iat more restrictive conditions, Price [31] has shown 
at the optimum detector may be synthesized as a 
odified type of radiometer called a “weighted radio- 
ster,’’ which unites conventional radar notions of pre- 
d postdetection sweep integration with radiometer 
inciples. In a paper dealing with the detection of 
ised signals in noise, Swerling [32] obtains results for 
wide variety of signal fading characteristics. The system 
nsidered consists of a predetection stage, a square-law 
velope detector, and a linear postdetection integrator. 
1e results obtained are expressions for the Laplace trans- 
rm of the probability density of the integrator output. 


DETECTION EXPERIMENTS 


The period since the XII General Assembly has seen 
e success of two radar detection experiments of con- 
lerable importance. In February, 1958, shortly after the 
ne when Venus and Earth were at close approach, the 
illstone Hill radar of the Massachusetts Institute of 
schnology, Lincoln Laboratory, was used in four at- 
mpts to detect and range Venus [33], [34]. At each 


7 Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 


137 


attempt, several thousand pulses were emitted, each of 
2 msec width and 440 Me carrier frequency. The trans- 
mission lasted for the Earth-Venus-Earth round trip 
travel time of about 4.5 minutes, and was followed by an 
equal interval of reception. The received signal was not 
processed immediately, but was sampled by a crystal- 
controlled switch and recorded digitally for later process- 
ing by an IBM 704 computer programmed as a weighted 
radiometer. 

Members of the Radioscience Laboratory of Stanford 
University were able to train an array of four rhombic 
antennas on the Sun for brief periods during April, 1959, 
and again during September, 1959. Several radar runs 
were made, each run being a transmission of twelve 
minutes duration, followed by twelve minutes of reception. 
The transmission was a sequence of thirty seconds ON, 
thirty seconds OFF alternations, with a carrier frequency 
of 25.6 Mc. As with the 1958 Venus Lincoln attempt, the 
received signal was recorded, and it took nearly a year of 
analysis before results could be announced [35]. 


Tue A Priori PROBLEM 


Several attempts have been made during the past 
three years to circumvent the a priorz problem. Abramson 
[36] has used some results in the theory of experiment 
design to show how it is possible to say that one system 
is superior to another regardless of cost assignments and 
a priori message probabilities. Bellman and Kalaba [37] 
have employed dynamic programming to study the learn- 
ing process, and to suggest methods of obtaining a przorz 
probabilities when they are not known, or when they 
are changing. Capon [38] has used nonparametric tech- 
niques to provide an approach which is strongly invariant 
to probability distribution, based upon comparisons be- 
tween the received sample and a reference sample drawn 
from a noise-only population. Another example of an 
attempt to deal with the a priori problem is a paper by 
Schwartz, Harris and Hauptschein [39]. In this paper, 
the authors introduce Carnap’s concept of inductive 
probability as a means of estimating the reliability of a 
channel by combining a prior? knowledge with the evidence 
obtained from transmission. An example of the applica- 
tion of this method to establish the null zones in a decision 
feedback system is cited by the authors. 

The theory of games has been used as a model of the 
radar jamming problem by Nilsson [40]. If the transmitter 
and jammer are constrained to a certain average power, 
the problem of selecting the spectral densities of the 
transmitted and jamming signals can be treated as a 
two-person zero-sum game. 


MiuscELLANEOUS 


In this section, we shall discuss several topics which 
have not received enough attention in the past three 
years to merit a special section in this report. The theory 
of sequential detection of signals is one area which seems 
likely to receive a good deal more attention in the future 


138 IRE 


[41] has formulated the 
noise by quantizing a 
given random variable into two levels and using Bernoulli 
sequential detection. In another paper, Blasblag [42] 
applies Wald’s sequential probability ratio test to the 
detection of a sine wave of arbitrary duty ratio in Gaus- 
sian noise, and in still another paper [43] some experi- 
mental results in sequential detection are presented. 

The nonoptimum detection of distributed targets was 
treated in a paper by Stewart and Westerfield [44]. In 
this paper, the authors consider both reverberation and 
resolution problems in the detection of sonar signals. 
The loss of signal detectability caused by a hard limiter 
followed by a band-pass filter was investigated by 
Manasse, Price and Lerner [45]. The case of soft limiting 
was investigated by Galejs [46] who used the error func- 
tion as a model of a smooth limiter. Galejs was able to 
calculate the SNR at the output of such a device followed 
by a narrow-band filter. 


than it has in the past. Blasblag 
problem of detecting signals in 


One simple alternative to making a binary decision 
at the receiver is to make a ternary decision by using, 
instead of one decision threshold, a pair of thresholds. 
Received signals falling in the null zone between thresholds 
result in no decision, and, unless later corrected, a “‘blank”’ 
appears in the output sequence. The improvement in 
the allowable information rate possible by the use of a 
null-zone reception was demonstrated by Bloom, et al. [47]. 
They show, for example, that under certain conditions 
the introduction of a single null zone achieves about half 
the improvement in information rate theoretically attain- 
able by increasing the number of receiver levels without 
limit. 

Harris, et al. [48], have investigated a number of cases 
in which the transmitter is notified of each occurrence of 
a“‘blank”’ and is asked for a repeat via a feedback channel 
that may or may not be error free. They have investigated 
the effect of terminating the process after a certain number 
of repeats, of various choices of permanent adjustments 
of the two thresholds, and of having time-varying thres- 
hold adjustments. A Gaussian distribution of the pre- 
decision noise was assumed. They also investigated the 
case where the predecision signal-to-Gaussian-noise ratio 
varies slowly as a function of time, requiring a continuous 
readjustment of threshold levels [49]. Cascades of such 
systems have also been treated [50]. Elias [51] described 
a method of supplementing a wide-band Gaussian noise 
channel with a similar analog channel in the reverse 
direction. By splitting each of these into subchannels, 
and by appropriate interconnections of the subchannels at 
both ends of the system, it is possible to reduce the com- 
plexity of coding required for forward transmission. 

An interesting question for detection by discrete data 
processing was raised by Middleton [52]. It is almost 
universal practice, in such detection problems, to sample 
periodically, but Middleton asks whether it is possible 
that random sampling offers any advantage over con- 
ventional periodic sampling. He is able to show that in 


TRANSACTIONS ON INFORMATION THEORY 


a wide variety of cases, periodic sampling is better and 
on this evidence, Middleton conjectures that periodit 
sampling is always better. 

The solution of detection problems, when we are giver 
data in a continuous closed interval, is often accomplishec 
by taking a limiting form of some discrete problem. The 
validity of this approach was questioned in a paper by 
Slepian [53]. Slepian shows that, under certain condition} 
often assumed in detection studies, the detection of a 
Gaussian signal in Gaussian noise can be accomplished 
with arbitrarily small error. Furthermore, this detection 
may be based on a sample of received signal of arbi 
trarily short duration. Work in the next three years will 
undoubtedly shed more light on the problem of singulai 
detection. . 


BIBLIOGRAPHY 


[1] W. B. Davenport, Jr., and W. L. Root, ‘‘An Introduction to 
the Theory of Random Signals and Noise,’ McGraw-Hill 
Book Co., Inc., New York, N. Y.; 1958. 

[2] J. S. Bendat, “Principles and Applications of Random Noise 

Theory,” John Wiley and Sons, Inc., New York, N. Y.; 1958 

[3] D. Middleton, “An Introduction to Statistical Communication 

Theory,” McGraw-Hill Book Co., Inc., New York, N. Yj 

1960. 

[4] C. W. Helstrom, “Statistical Theory of Signal Detection,” 

Pergamon Press, Ine., New York, N. Y.; 1960. 

[5] C. R. Cahn, ‘Performance of digital phase-modulation com- 


munication systems,’ IRE Trans. oN CoMMUNICATIO} 
Systems, vol. CS-7, pp. 3-6; May, 1959. 
[6] J. G. Lawton, ‘Theoretical error rates of ‘differentiall 


coherent’ binary and ‘Kineplex’ data transmission systems,” 
Proc. IRE, vol. 47, pp. 333-334; February, 1959. 
[7] C. R. Cahn, and J. G. Lawton, ‘‘Comparison of coherent and 

phase-comparison detection of a four-phase digital signaly 
Proc. IRE, vol. 47, p. 1662 (Correspondence); September 
1959. : 


8] E. Hopner, “An experimental modulation-demodulation schem 
for high speed data transmission,’ JBM J. Res. and Dev., vol 
3, pp. 74-84; January, 1959. } 
E. D. Sunde, “Ideal binary pulse transmission by AM an 
FM,” Bell Sys. Tech. J., vol. 38, pp. 1357-1426; Novembil 
1959. ! 
| S. Reiger, “Error rates in data transmission,’ Proc. IRE, 
vol. 46, pp. 919-920; May, 1958. | 
M. Masonson, ‘‘Binary transmission through noise and fading,” 
1957 IRE Nationa CoNVENTION RECORD, pt. 2; pp. 69-82. — 
|] G. L. Turin, “Error probabilities for binary symmetric ideal 
reception through nonselective slow fading and noise,” PRoc. 
IRE, vol. 46, pp. 1603-1619; September, 1958. 

|] G. L. Turin, “Some computations of error rates for selectively 
fading multipath channels,’ Proc. Natl. Electronics Conf. 
vol. 15, (in press). 

| T. Kailath, “Correlation detection of signals perturbed by 
random channel,’ IRE Trans. on InrorRMATION THEORY 
vol. IT-6, pp. 361-366; June, 1960. 

J. N. Pierce, ‘“Theoretical diversity improvement in frequency- 
shift keying,’ Proc. IRE, vol. 46, pp. 903-910; May, 1958. 
G. I. Cohn, and L. C. Peach, “Detection of radar signals by 
direct measurement of their effects on noise statistics,’ Proc 
Natl. Electronics Conf., vol. 14, pp. 821-831; October, 1959. 

] R. P. Dilworth, and E. Ackerlind, ‘‘The analysis of posi 
detection integration systems by Monte Carlo methods,’ 1957 
_ IRE Nationa Convention Recorp, pt. 2; pp. 40-47. 4 
] J. J. Bussgang, P. Nesbeda and H. Safran, ‘‘A unified analysis 
of range performance of CW, pulse and pulse Doppler radar,’ 
Proc. IRE, vol. 47, pp. 1753-1769; October, 1959. (Includes ¢ 
simplified analysis for sweep-integrator systems containing 
square-law detectors. ) 

B. A. Green, Jr., “Radar detection probability with logarithmi 
detectors,’ IRE Trans. on Inrormation Turory, vol. IT-4 
pp. 50-52; March, 1958. 

W. M. Stone, R. L. Brock, and K. J. Hammerle, “On th 
first. probability of detection by a radar receiver system, 
IRE Trans. oN INrForMatTION THEORY, vol. IT-5, pp. 9-11 
March, 1959. 


/K. 5. Miller and R. I. Bernstein, “An analysis of coherent 

‘integration and its application to signal detection,’ IRE 

| Trans. oN InNrorMATION Tuuory, vol. IT-3, pp. 237-248; 

| December, 1957. , 

J. Galejs and W. Cowan, ‘“Interchannel correlation in a bank 

of parallel filters,’ IRE Trans. on INFORMATION THEORY, 

vol. IT—5, pp. 106-114; September, 1959. 

| I. 8. Reed, E. J. Kelly, and W. L. Root, ‘‘The Detection of 

| Radar Echoes in Noise. Part I: Statistical Preliminaries and 
Detector Design. Part II: The Accuracy of Radar Measure- 

_ments,’’ Tech. Rept. Nos. 158 and 159, Lincoln Lab., Mass. Inst. 
Tech., Lexington, Mass.; June 20, 1957 and July 19, 1957. 
J. Max, ““Mismatching Filters to Improve Resolution in Radar,” 
Grp. Rept. No. 36-32, Lincoln Lab., Mass. Inst. Tech., Lex- 
ington, Mass.; October 1, 1958. 

W. McC. Siebert, “Some applications of detection theory to 
eee, 1958 IRE Nationat Convention ReEcorp, pt. 4; 
pp. 5-14. 

 P. D. Strum, “Considerations in high-sensitivity microwave 
radiometry,” Proc. IRE, vol. 46, pp. 48-53; January, 1958. 
E. J. Kelly, D. H. Lyons and W. L. Root, ‘The Theory of the 

| Radiometer,” Grp. Rept. No. 47.16, Lincoln Lab., Mass. Inst. 
Tech., Lexington, Mass., May 2, 1958. 

| D. Middleton, “On the detection of stochastic signals in 

additive normal noise—Part I,’”’ IRE Trans. on InrorMATION 

TuHeEory, vol. IT—-8, pp. 86-121; June, 1957. 

D. Middleton, ‘‘On new classes of matched filters and general- 

izations of the matched filter concept,” IRE Trans. on 

INFORMATION TuHxoRy, vol. IT—6, pp. 349-360; June, 1960. 

T. Kailath, ‘Optimum receivers for randomly varying chan- 

nels,” in “‘Proceedings of the Fourth London Symposium on 

Information Theory,” C. Cherry, Ed., Butterworth Scientific 

Publications, London, Eng.; 1961. 

R. Price, “Radiometer Techniques in Radar,’’? Lincoln Labo- 

ratory, Mass. Inst. Tech., Lexington, Mass., Rept. No. 

_ 34-G-0003; June 10, 1960. 

_P. Swerling, ‘Detection of fluctuating pulsed signals in the 

presence of noise,’ IRE Trans. on INrorMATION THEORY, 

vol. IT-8, pp. 175-178; September, 1957. 

R. Price, et al., “Radar echoes from Venus,” Science, vol. 129, 

pp. 751-753; March 20, 1959. 

| R. Price, “The Venus Radar Experiment,’ printed paper 

presented at the Ninth General Assembly of A. G. A. R. D., 

N. A. T. O., Aachen, Germany; September 21, 1959. 

| V. R. Eshleman, R. C. Barthle and P. B. Gallagher, “Radar 

_ echoes from the sun,”’ Science, vol. 131, pp. 329-332; February 

5, 1960. 

N. Abramson, ‘“The application of ‘comparison of experiments’ 

to detection problems,” 1958 IRE Nationau CoNVENTION 

| REcorD, pt. 4; pp. 22-26. 

] R. Bellman, and R. Kalaba, “On the role of dynamic pro- 

gramming in statistical communication theory,’ IRE Trans. 

oN INFORMATION THEORY, vol. IT-3, pp. 197-203; September, 

1957. 

| J. Capon, ‘A nonparametric technique for the detection of a 

constant signal in additive noise,’ 1959 IRE WESCON 

CoNnVENTION RxEcorD, pt. 4; pp. 92-103. 

L. S. Schwartz, B. Harris, and A. Hauptschein, “Information 

_ rate from the viewpoint of inductive probability,’ 1959 IRE 
NaTIONAL CONVENTION RECORD, pt. 4; pp. 102-111. 

N. J. Nilsson, ‘An application of the theory of games to 
radar reception problems,” 1959 IRE NationaL CoNVENTION 
ReEcorpD, pt. 4; pp. 130-140. 

H. Blasbalg, “The relation of sequential filter theory to in- 
formation theory and its application to the detection of signals 
in noise by Bernoulli trials,’ IRE Trans. on INFORMATION 
Tueory, vol IT-3, pp. 122-131; June, 1957. 

H. Blasbalg, ‘“The sequential detection of a sine-wave carrier 

of arbitrary duty ratio in Gaussian noise,’ IRE Trans. on 

INFORMATION THrory, vol. IT-3, pp. 248-256; December, 
1957. 

H. Blasbalg, “Experimental results in sequential detection,” 

IRE Trans. on INrormMaTion Tuuory, vol. IT—5, pp. 41-51; 

June, 1959. 

J. L. Stewart, and E. C. Westerfield, ‘“‘A theory of active sonar 

detection,” Proc. IRE, vol. 47, pp. 872-881; May, 1959. 

R. Manasse, R. Price and R. M. Lerner, ‘Loss of signal 

detectability in band-pass limiters,’’ TRE Trans. on INFORMA- 

TION THeEory, vol. IT—4, pp. 34-38; March, 1958. 

J. Galejs, “Signal-to-noise ratios in smooth limiters,’ IRE 

Trans. ON InFoRMATION TuHroRY, vol. IT-5, pp. 79-85; 

June, 1959. 

| F. J. Bloom, et al., “Improvement of binary transmission by 

null-zone reception,” Proc. IRE, vol. 45, pp. 963-975; July, 

HOD 


SE TI, ee A A I TOR TTR A TR hl ee 


1 Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 


139 


[48] B. Harris, A. Hauptschein, and L. 8. Schwartz, “Optimum 
decision feedback circuits,’ 1957 IRE Nationan CoNVENTION 
Recorp, pt. 2; pp. 3-10. See also, Operations Res., vol. 5, 
pp. 680-692; October, 1957. 

49] B. Harris, et al., “Binary decision feedback systems for main- 
taining reliability under conditions of varying field strength,” 
Proc. Natl. Electronics Conf., vol. 13, pp. 126-140; October, 1957. 
[50] J. J. Metzner, “Binary relay communication with decision 
feedback,’ 1959 IRE Natrona CoNvENTION ReEcorp, pt. 4; 
pp. 112-119. 

P. Elias, “Channel capacity without coding,’ 1957 IRE 
NatronaL ConvENTION ReEcorD, pt. 2; p. 49. (Abstract only.) 
Complete text in Quart. Progress Rept., Res. Lab, of Electronics, 
Mass. Inst. Tech., Cambridge, Mass., pp. 90-93; October 25, 
1956. 

D. Middleton, ““A comparison of random and periodic data 
sampling for the detection of signals in noise,’’ IRE Trans. on 
InrorMaTIon THEORY, vol. IT—5, pp. 234-247; May, 1959. 

D. Slepian, ‘“‘SSome comments on the detection of Gaussian 
signals in Gaussian noise,’ IRE Trans. oN INFORMATION 
TuHeEory, vol. IT—4, pp. 65-68; June, 1958. 


[51] 


PART 5: PREDICTION AND FILTERING 


L. A. ZADEHf?, FELLOW, IRE 


UCH OF THE research on prediction and 

| \ { filtering conducted in the United States during 

the period 1957-1960 was concerned essentially 

with various extensions of Wiener’s theory. In particular, 

extensions involving nonstationary continuous-time proc- 

esses, vector-valued processes, stationary and nonstation- 

ary discrete-time processes, non-Gaussian processes, 

incompletely specified processes, and nonlinear filters and 
predictors have received attention. 

A new and very promising direction in prediction 
theory has been opened by the application of Bellman’s 
dynamic programming to the determination of optimal 
adaptive filters and predictors. Actually, the basic work 
of Bellman and Kalaba [1]|-[3], and its extensions and 
applications by Freimer [4], Aoki [5], Kalman and Koepcke 
[6], and Merriam [7] are not concerned with prediction 
and filtering as such. However, the recent work of Kalman 
[41] shows that, mathematically, there is a duality between 
the filtering problem and the control problems considered 
by Bellman and Kalaba, and others. Thus, these contri- 
butions are likely to have a considerable impact on the 
course of the development of the theory of filtering and 
prediction in the years ahead; they poimt toward an 
increasing utilization of digital computers and the con- 
cepts and techniques of discrete-state systems both in 
the design of predicting and filtermg schemes and in their 
implementation. 

During the past two years, four books containing in 
aggregate a substantial amount of material on prediction 
and filtering have been published. Davenport and Root [8] 
present a clear exposition of Wiener’s theory and some of 
its extensions. Wiener’s [9] monograph discusses orthogo- 
nal expansions of nonlinear functionals, but stops short 


+ Dept. of Elec. Engrg., University of California, Berkeley, 
Calif. 


140 IRE TRANSACTIONS ON 


of applying them to prediction problems. Bendat [10] 
presents a general survey of linear prediction and treats 
some special problems in considerable detail. Middleton’s 
[11] weighty treatise contains a thorough exposition of 
the classical prediction theory together with a theory 
of reception in which the problems of prediction and 
filtering are formulated in the framework of decision 
theory. The appendix of Middleton’s book includes an 
informative section on the solution of the Wiener-Hopf 
equation and some of its variants. 

A more detailed discussion of the contributions to 
filtering and prediction theory is presented in the following 
pages. For convenience, the subjects of nonlinear filter- 
ing, nonstationary and discrete-time filtering, and miscel- 
laneous contributions are dealt with separately. 


NONLINEAR FILTERING 


The contributions to nonlinear filtering and prediction 
have centered largely on the fundamental work of Wiener 
[12] and its earlier extensions by Bose [13] and Barrett [14]. 
A discernible trend in the research in this area is to con- 
sider special types of processes for which optimal non- 
linear filters assume a simple form. A key work in this 
connection is that of Barrett and Lampard [15] mn which 
the class, A,’ of all second-order density functions admitt- 
ing a diagonal representation of the form 


pler, ta; 7) = plerrpla:) DO alr) Oy(es) (02) — (1) 


n=0 


is introduced. Here p(x,, x2; 7) denotes the second-order 
density of a stationary process: {2@)}, 2; = 2@), 
t = x(t + 7), p(x) is the first-order density, and {6,(x) } 
is a family of polynomials with the orthogonality property 


if NOLL) On Bean One (2) 


In particular, Barrett and Lampard have shown that 
Gaussian and Rayleigh processes are of this type, with 
the @, beimg Hermite and Laguerre polynomials, re- 
spectively. Convergence and other aspects of the Barrett- 
Lampard expansion were investigated by Leipnik [16], 
while necessary and sufficient conditions under which 
p(t, X2; 7) can be expressed in the form (1) have been 
given by J. L. Brown [17]. Brown also studied [18] a 
more general class of densities for which the expansion 
(1) is nondiagonal and the coefficients a,,,,(7) are restricted 
by the relation’a,,(@)/= d,aiG@),m = 1, 2)6—-, thes 
being real constants. As shown by Brown, processes with 
densities of this type exhibit a number of interesting 
properties. 

One way in which the Barrett-Lampard expansion can 
be used in nonlinear filtering was pointed out by Zadeh 


1In Barrett and Lampard’s definition of A, p(x%1, %2; 7) is not 
assumed to be symmetrical. 


INFORMATION THEORY Ji 


[19]. Specifically, assume that the second-order densit 
of a process with zero mean can be represented by (1 
with the 0,(7) not necessarily having the form of poly 
nomials. Then, if an optimal (minimum variance) filter 1 
sought in the class of filters admitting the representatio 


way 


Fa ¥ i) “Cy eee ie 


where the K,,(7) are undetermined kernels, and the desire 
output is written as . 


So 


Pa) = Df Keele — 9] dr, 
meM vd --o 

where MV is a finite index set and the K*(r) are give 

kernels, the determination of the K,(r) reduces to th 

solution of a finite number of Wiener-Hopf integre 

equations 


i KC ee 


a i K*@a, — a) dr 
with K, = 0, ifng M. 

Another type of process—for which the problem o 
determining an optimal nonlinear predictor is greath 
simplified—was introduced by Nuttall [20]. Specificall 
Nuttall calls a process separable” if the conditional aa 


/ (con eee a, | ©) day | 
| 
(a a u)p(7), ( 


where uw is the mean value of the process and p(r) is it 
normalized autocorrelation function. Separable processe 


form a slightly broader class than that defined by Brows 
[18]. 

Among the many interesting properties of separabl 
processes is the following prediction property. Let s(¢ 
be a signal mixed with additive noise. Then, if {s(¢)} is: 
separable process, the best estimate of s(t + 7) in term 
of the best estimate of s(t) is given by 


s*(t bt) 18" U) psp) sie Mall eos Cahp 


where p,(7) and p, are the normalized autocorrelation ani 
the mean value of the signal process, and starred quantitie 
represent optimal (minimum variance) estimates. In th 
absence of noise, the explicit formula for the best pre 
dictor in terms of s(t) becomes 


of x, given x, can be represented as 


l| 


1a ras aes 


7 


s*(i + 7) = 80) p.(7) ep eal 


Still another type of process for which the predictio1 
problem is manageable was considered by D. A. Georg 
[21]. Here the observed signal f(t) is assumed to be th 


(8 


> It should be noted that the term ‘‘separable process’ is used ii 
the theory of stochastic processes in an altogether different sense 


Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 


put of an invertible nonlinear system N preceded by 
invertible linear system L to which a white Gaussian 
1a] x(t) is applied. Thus, symbolically, f = NLa and 
= L'N‘f. Then, if an optimal estimate of f(t + @) 
enoted by f(t + @), it is not difficult to find an operator 
acting on the present and past values of x(t) such that 
+a) = H,|x(t)]. Once H, has been found, f(t + a) 
be expressed in terms of the present and past values 
(t) by the relation f(t + «) = H,L‘N'f. 
Vhile some authors have sought to simplify the pre- 
tion problem by considering processes with special 
perties, others have turned to special types of non- 
‘ar operators. In particular, the work of Bose [13], [22] 
s extended by D. A. Chesler [23] to operators of the 
m F(>°*_, ¢,.¢,), where F denotes either a linear operator 
h memory, or a nonlinear memoryless operator, or a 
re general nonlinear operator possessing an inverse; the 
re adjustable constants, and the ¢, are nonlinear opera- 
s such that the expectation H}¢,(x)¢,,(«)} = 0 for 
# n, x being the input to the filter. As was shown by 
se, in the absence of F the optimal value of each ¢ can 
determined by measuring the mean-square error as a 
ction of, say, c; and assigning to c; the value which 
umizes the mean-square error. This method is shown 
Chesler to be applicable also when F is a linear operator 
a nonlinear operator with no memory, The extension 
ess straightforward when the only assumption on FP 
hat it possesses a realizable inverse. 
n all the foregoing analyses the signal process is 
umed to be stationary. However, there are many 
tations of practical interest in which an appropriate 
resentation for the signal is a series of the form 
s(t) = 2D ap(t), (9) 
which the ¢;(t) are known functions of time and the 
are unknown constants or random variables. In such 
es, the problem of filtering or predicting s(t) reduces 
the estimation of the coefficients a,. 
t was shown some time ago by Laning [24], that when 
the noise is additive, stationary, and Gaussian, 2) the 
1t distribution of the a; is known, and 3) the loss 
ction L(e) is non-negative and vanishes for e« = 0, 
imal estimators for the a; are memoryless nonlinear 
ctions of linear combinations of values of the input 
xr the interval of observation. In a recent paper, 
ilar results were obtained by a different and more 
yrous method by Kallianpur [25]. More specifically, for 
case where the interval of observation is [0, 7], and 
loss function is quadratic, Kallianpur derived explicit 
ressions for the best estimate of s(t) at time T + T, 
erms of n linear functionals of the form {4 x(t)p;(t) dt, 
1,2, ---,n, where x(¢) is the sum of signal and noise, 
| the p,(t) are square integrable solutions of integral 
lations 
R(t — 7)p,(7) drt = (10) 


g(t), Dic Beye oer OF 


141 


in which R(z7) is the correlation function of the process. 

More concrete results for the same general problem 
were obtained by Middleton [26], and Glaser and Park [27]. 
In particular, Middleton found explicit expressions for 
minimum variance estimators of the a, for the cases 
where 1) the a; are jointly normally distributed, 2) the 
a, are independent and Rayleigh distributed, 3) the a; 
are independent and their distributions are not sym- 
metrical, and 4) the a; are independent and their distri- 
butions are symmetrical. Of these cases, only 1) and 4) 
yield linear estimators for the a;. 

The relation between maximum likelihood, minimum 
variance, and least squares estimates of the a; was studied 
in earlier papers by Mann [28], and Mann and Moranda 
[29]. A number of interesting properties of minimum 
variance estimates of s(t) and its derivatives for the case 
where the g,(t) are polynomials in ¢ were found by I. 
Kanter [30], [81]. A central result of Kanter is that an 
optimal weighting function for predicting the jth deriva- 
tive of nth-degree polynomial can be expressed uniquely 
and simply in terms of optimal estimators of kth 
derivatives of kth-degree polynomials, with k ranging 
between 7 and n. 


FILTERING AND PREDICTION OF NONSTATIONARY, 
DiscRETE-TIME, AND MrIxEep PROCESSES 


As is well-known [82], extensions of Wiener’s theory 
to nonstationary processes lead to integral equations of 
the general form 


i RU, Dalz) d= a, cee 


in which R(t, 7) is the covariance function of the observed 
process. Little can be done toward the solution of this 
equation when R(f, 7) is an arbitrary covariance function. 
Thus, contributions to the theory of prediction of non- 
stationary continuous time processes consist essentially 
of methods of solving (11) in special cases. 

Along these lines, Shinbrot [383] discussed the solution 
of (11) for the case where R(t, 7) can be expressed in the 
form 


N 


RG = Dp aD a 


n=1 


(12) 


Using Shinbrot’s methods, the solution of (11) reduces 
to the solution of a system of differential equations with 
time-varying coefficients. These is some advantage in 
such a reduction when one has available a differential 
analyzer or an equivalent machine. Similar results are 
yielded by a theory due to Darlington [84], [385], in which 
many of the concepts and techniques of time-invariant 
networks are extended to time-varying networks. As in 
the paper of Miller and Zadeh [82], a key assumption 
in these approaches is that the observed process may be 
generated by acting on white noise with a product of 
differential and inverse-differential operators, or equiva- 
lently, with a lumped-parameter linear time-varying 


142 IRE TRANSACTIONS ON 


network. Darlington’s paper [34] also contains a simpli- 
fied technique for finding a finite memory Wiener filter 
for stationary signal and noise. 

A special case for which explicit solution can be found 
has been studied by Bendat [36]. Here the basic assump- 
tion is that the signal is of the form s(t) = 0 for t < 0, 
s(t) = >°* (a, cos nwt + b, sin nwt) for t > 0, where the 
a, and 6b, are random variables with known covariance 
matrices, while the covariance function of the noise is 
of the form 


Rik) = Ae =" cosy tei) ton eer 
=(@) ioe & <O o % <0: (13) 
Closely related cases in which the prediction problem 


can be solved completely are those in which the non- 
stationarity of signal and noise processes is due to a 
truncation (e.g., multiplying the signal and noise by a 
step function) of stationary processes. This is also true 
in the case of discrete-time processes, as is demonstrated 
by several examples in Friedland’s [37] extension of 
Wiener’s theory to nonstationary sampled-data processes. 

Several interesting results concerning the linear pre- 
diction of filtermg of stationary discrete-time processes 
were described by Blum [38]-{40]. In particular, Blum has 
developed recursive formulas which express the estimate 
at time nm in terms of a finite number of past estimates 
and past values of the observed process. This type of 
representation is especially useful in connection with 
so-called growing memory filters, 7.e., filters which act 
on the entire past of the input. Thus, if the input sequence 
(starting at t = 0) is denoted by 2, 21, --- , %, and the 
filter output at time n is denoted by 2z,, then 2, is ex- 
pressible as z, = >."_, ¢,a,, in which the c, depend on n. 
A shortcoming of this representation is that as time 
advances the c, have to be recomputed at each step and 
their number grows with n. On the other hand, a recursive 
relation (if it exists) is of the form 


Be = Ona ar Mares 7 = Denk 


+ Ons + Ocean =F a si DiEa es (14) 


where the a’s, b’s, k and e are constants independent of 
n, and hence, need not be recomputed. One complication 
in this approach to the problem is that in order to start 
the recursion one must know initially 2, 2, --: , 2. 

A somewhat related but more general approach has 
been formulated recently by Kalman [41]. Specifically, 
KXalman assumes that the observed process is an n-di- 
mensional vector process {y(t)} which is generated by 
acting with a linear discrete-time system on a white 
noise {u(t)}; thus, 


y(t) = P(t)x(2) 
x(t + 1) = G(x) + ul), 


where X(¢) and y(t) are vectors and P(t) and G(t) are 
given time-varying matrices. (This assumption is analo- 


(15) 


INFORMATION THEORY 


gous to the usual one in the case of nonstationary co) 
tinuous-time prediction, viz., that the observed proce 
can be generated by acting on white noise with a tim 
varying network.) Kalman shows that an optimal (min 
mum variance) estimate of x(t) is given by the recursi 
relation 


x*(é+ 1) = [G@ — AODPO"*® + 40yO ae 
where 
A(t) = G()M()P'(H)[POMOP'OI]”, (W 

and M(t) is given by : 
M(t + 1) = [G() — AMPOIMOED + QM, CH 


where G’ is the transpose of G and Q(t) is the covarianeé 
matrix Q(t) E{u(t)u’(t)}. The matrix M(t) is th 
expectation of the matrix e(t)e’(t), where e(¢) is the err 
at time ¢. In this formulation, to start the recursion on 
must know «*(0) and M(0). However, in most cases th 
effect of the initial choices of «*(0) and M(O) will b 
insignificant by the time the system reaches its steady 
state. 

An interesting observation made by Kalman is tha 
the prediction problem, in his formulation, is dual to 
problem in control theory in which the objective is t 
find an input which minimizes a quadratic loss functior 

In additions to extensions of Wiener’s theory to non 
stationary continuous- and discrete-time processes, @3 
tension to processes of mixed type were also reporte¢ 
In particular, Robbins [42] solved the mean-square opti 
mization problem for the case where the filter consist 
of a linear time-invariant system followed by a sample 
which is followed in turn by another linear time-invariar 
system. Janos [43] gave a complete analysis of the cas 
where a stationary signal is multiplied by a train ¢ 
rectangular pulses, yielding a periodic pulse-modulate 
time series. The filter is assumed to be a time-invarian 
linear network. The integral equation satisfied by th 
impulsive response of the optimum filter is of the Wienel 
Hopf type, but a multiplying factor involving trains ¢ 
rectangular pulses complicates its solution. A method ( 
solution of this equation is given by Janos for the infinit 
memory as well as the finite memory case. j 


MIscELLANEOUS CONTRIBUTIONS 


There are several not necessarily unimportant problem 
in filtering and prediction which have received relativel 
little attention during the period under review. Contr 
butions concerned with such problems are discussed briefl 
in this section. : 

It has long been recognized that the use of a quadrati 
loss function imposes a serious limitation on the appl 
cability of Wiener’s theory. Under certain condition 
however, optimality under the mean-square-error criterio 
implies optimality under a wide class of criteria. Sue 
conditions have been found by Benedict and Sondhi [44 


{ 


independently, by Sherman [45]. Thus, Benedict 
Sondhi have shown that in the case of a Gaussian 
ess optimality with respect to a loss function of the 
L = &, where ¢ denotes the error, implies optimality 
respect to any loss function of the form L = >>, | € |", 


) 


ere n > O but is not restricted to integral values. In 


rman’s result, L = f(e) is an even function and 
€¢ = O implies f(e) > f(a). More special cases 
lving the design of optimal filters under non-mean- 
are error criteria have been considered by Bergen [46] 
Wernikoff [47]. A time-weighted mean-square-error 
erion which can be used to reduce the settling time of 
ptimal linear filter was employed by Ule [48]. 

n extension of Wiener’s theory to random parameter 


tems was described by Beutler [45]. In Beutler’s 
mulation, the signal and noise are assumed to have 


sed through a time-invariant random linear system 


wore being available for application to a filter or pre- 
itor. The linear system is assumed to be characterized 


a transfer function H(w, vy), in which y is a random 


rameter with a known distribution. In effect, this 
ounts to modifying the statistical characteristics of the 


inal signal and noise processes. 


The multiple series prediction problem for the infinite 
imory case was considered by Hsieh and Leondes [50]. 


their paper Hsieh and Leondes describe a simplified 


‘thod of solving the simultaneous integral equations for 
» weighting functions. Their technique is not applicable, 
wever, to the finite memory case. 

The optimization of continuous-time filters and pre- 
-tors is frequently carried out by discretizing time and 
en letting the interval between successive samples 
proach zero. There are many published papers in 


rich limiting processes of this type are used without 
equate justification. A careful and rigorous analysis 


the problems involved in obtaining optimum con- 


uous-time linear estimates as limits of discrete-time 
imates was given by Swerling [51]. 


] 


] 


BIBLIOGRAPHY 


R. Bellman, and R. Kalaba,‘“‘On communication processes 
involving learning and random duration,” 1958 IRE Natronan 
CoNVENTION REcoRD, pt. 4; pp. 16-21. 

R. Bellman, and R. Kalaba, ‘‘On adaptive control processes,” 
1959 IRE Natronat CoNvENTION ReEcoRD, pt. 4; pp. 3-11. 
Reprinted in IRE Trans. on Automatic ConTro1, vol. AC-4, 
pp. 1-9; November, 1959. 

R. Bellman, and R. Kalaba, ‘Dynamic programming and 
adaptive processes: mathematical foundation,’ IRE Trans. 
on Automatic ContTROL, vol. AC-5, pp. 5-10; January, 1960. 


] M. Freimer, ‘““A dynamic programming approach to adaptive 


control processes,’ 1959 IRE NatronaL CONVENTION RECORD, 
pt. 4; pp. 12-17. Reprinted in IRE Trans. on AUTOMATIC 
Controu, vol. AC-4, pp. 10-15; November, 1959. 

M. Aoki, ‘“Dynamic programming and numerical experimenta- 
tion as applied to adaptive control systems,’ Dept. of Engrg., 
University of California, Los Angeles, Calif., February, 1960. 


] R. E. Kalman, and R. W. Koepcke, ‘Optimal synthesis of 


linear sampling control systems using generalized performance 
indexes,” Trans. ASME, vol. 80, pp. 1820-1826; 1958. 

C. W. Merriam, “A class of optimum control systems,” J. 
Franklin Inst., vol. 267, pp. 267-281; April, 1959. ' 

W. B. Davenport, Jr., and W. L. Root, “An Introduction to 
the Theory of Random Signals and Noise,’ McGraw-Hill 
Book Co., Inc., New York, N. Y.; 1958. 


Elias, et al.: Progress in Information Theory in the U.S.A., 1957-1960 


[9] 


[10] 
(11) 


[27] 


[28] 


[29] 


[30] 


[31] 


[33] 


[34] 


[35] 


143 


N. Wiener, “Nonlinear Problems in Random Theory,” The 
Technology Press, Cambridge, Mass., and John Wiley and 
Sons, Inc., New York, N. Y., 1958. 

J. 8. Bendat, “Principles and Applications of Random Noise 
Theory,” John Wiley and Sons, Ine., New York, N. Y.; 1958. 
D. Middleton, ‘““An Introduction to Statistical Communication 
Theory,’ McGraw-Hill Book Co., Inc., New York, N. Y.; 1960. 


| N. Wiener, ‘““Mathematical problems of communication theory,” 


Summer Session Lecture Notes, Mass. Inst. Tech., Cambridge, 
Mass.; 1953. 


| A. Bose, ‘‘A Theory of Nonlinear Systems,’ Res. Lab. of 


Electronics, Mass. Inst. Tech., Cambridge, Mass., Rept. No. 
309; May, 1956. 

J. Barrett, “Application of the Theory of Functionals to Com- 
munication Problems,’’ Cambridge University, Cambridge, 
England, Engrg. Lab. Rept., 1955. 


| J. Barrett, and D. Lampard, ‘‘An expansion for some second 


order probability distributions and its application to noise 
problems,” IRE Trans. on Inrormation TueEory, vol. IT-1, 
pp. 10-15; March, 1955. 


| R. Leipnik, ‘Integral equations, biorthonormal expansions, 


and noise,” SIAM J., vol. 7, pp. 6-30; March, 1959. 


] J. L. Brown, Jr., “A criterion for the diagonal expansion of a 


second-order probability distribution in orthogonal polyno- 
mials,’ IRE Trans. on InrormMation TuHurory, vol. IT-4, 
p. 172, December, 1958. (Correspondence. ) 


| J. L. Brown, Jr., “On a cross-correlation property for stationary 


random processes,’ IRE Trans. on INrorMATION THEORY, 
vol. IT-3, pp. 28-31; March, 1957. 


| L. Zadeh, “On the representation of nonlinear operators,” 


1957 IRE WESCON Convention Recorp, pt. 2; pp. 105-113. 


| A. H. Nuttall, “Theory and application of the separable class 


of random processes,’”’ Res. Lab. of Electronics, Mass. Inst. 
Tech., Cambridge, Mass., Rept. No. 343; May, 1958. 


|] D. A. George, ‘‘The Prediction of Gaussian-Derived Signals,’” 


Res. Lab. of Electronics, Mass. Inst. Tech., Cambridge, 
Quart. Prog. Rept., pp. 107-109; July, 1958. 


| A. Bose, “Nonlinear system characterization and optimiza- 


tion,” Trans. 1959 Internat. Symp. on Circuit and Information. 
Theory. Reprinted in IRE Trans. on Crrcurr THeory, vol. 
CT-6, pp. 30-40; May, 1959. 

D. A. Chesler, “Optimum Nonlinear Filters with Fixed- 
Output Networks,’’ Res. Lab. of Electronics, Mass. Inst. 
Tech., Cambridge, Quar. Prog. Rept., pp. 118-124; July, 1958. 
J. H. Laning, Jr., ‘Prediction and Filtering in the Presence of 
Gaussian Interference,’’ Instrumentation Lab., Mass. Inst. 
Tech., Cambridge, Rept. No. R-27; October, 1951. 

G. Kallianpur, ‘A problem in optimum filtering with finite 
data,” Ann. Math. Stat., vol. 30, pp. 659-669; September, 1959. 
D. Middleton, “A note on the estimations of signal waveform,” 
IRE Trans. on InFrormation THeEory, vol. 1T—5, pp. 86-89; 
June, 1959. 

E. M. Glaser, and J. H. Park, Jr., ‘On signal parameter esti- 
mation,’ IRE Trans. on InrormMatTIon THrory, vol. IT-4, 
pp. 173-174; December, 1958. 

H. B. Mann, “A theory of estimation for the fundamental 
random process and the Ornstein-Uhlenbeck process,’’ Sanhkyd, 
vol. 13, pt. 4, pp. 325-358; June, 1954. 

H. B. Mann, and P. B. Moranda, “On the efficiency of the 
least squares estimates of parameters in the Ornstein-Uhlen- 
beck process,”’ Sankhya, vol. 13, pt. 4, pp. 351-358; June, 1954. 
I. Kanter, ‘‘The prediction of derivatives of polynomial sig- 
nals in additive stationary noise,’ 1958 IRE WESCON 
ConvENTION ReEcorD, pt. 4; pp. 131-146. 

I. Kanter, “Some new results for the prediction of derivatives 
of polynomial signals in additive stationary noise,’’? 1959 IRE 
WESCON Convention REcoRD, pt. 4; pp. 87-91. 

K. 8. Miller and L. A. Zadeh, ‘Solution of an integral equa- 
tion occurring in the theories of prediction and detection,” 
IRE Trans. on Inrormation TuHerory, vol IT-2, pp. 72-75; 
June, 1956. 

M. Shinbrot, “A generalization of a method for the solution of 
the integral equation arising in optimization of time-varying 
linear systems with nonstationary inputs,’ IRE Trans. on 
InrorMATION TuEoRy, vol. IT-3, pp. 220-224; December, 
1957. 

S. Darlington, “Linear least-squares smoothing and predic- 
tion, with applications,” Bell Sys. Tech. J., vol. 37, pp. 1221- 
1294; September, 1958. 

S. Darlington, ‘“‘Nonstationary smoothing and prediction using 
network theory concepts,’ Yrans. 1959 Internatl. Symp. on 
Circuit and Information Theory. Reprinted in IRE Trans. ON 
Crrcurt THrory, vol. CT-6, pp. 1-13; May, 1959. 


J.S. Bendat, “Exact integral equation solutions and synthesis 
for a large class of optimum time variable linear filters,’ IRE 
Trans. ON INroRMATION THEORY, vol. IT-3, pp. 71-80; 
March, 1957. 
[37] B. Friedland, “Least squares filtering and prediction of non- 
stationary sampled data,’’? Inform. and Control, vol. 1, pp. 
297-313; December, 1958. 
38] M. Blum, “Fixed memory least squares filters using recursive 
methods,’ IRE Trans. oN INFoRMATION THEORY, vol. IT-3, 
pp. 178-182; September, 1957. 
39) M. Blum, ‘Recursion formulas for growing memory digital 
filters,’ IRE Trans. on INFoRMATION THEORY, vol. IT-4, 
pp. 24-380; March, 1958. 
M. Blum, “On the mean square noise power of an optimum 
linear discrete filter operating on polynomial plus white noise 
input,’ IRE Trans. on INFoRMATION THEORY, vol. IT-3, 
pp. 225-231; December, 1957. 
R. Kalman, ‘‘A new approach to linear filtering and prediction 
problems,” J. Basic Engrg., vol. 82D, pp. 35-45; March, 1960. 
| H. M. Robbins, ‘‘An extension of Wiener filter theory to partly 
sampled systems,’”? IRE Trans. on Circuit THeory, vol. 
CT-6, pp. 362-370; December, 1959. 
|] W. A. Janos, “Optimal filtering of periodic pulse-modulated 
time series,” IRE Trans. oN INFORMATION THurOoRY, vol. 
IT-—5, pp. 67-74; June, 1959. 


[40 


On the Approach of a Filtered Pulse Train to 


IRE TRANSACTIONS ON INFORMATION THEORY 


J ul 


[44] T. R. Benedict, and M. M. Sondhi, ‘‘On a property of Wiene 
filters,’’ Proc. IRE, vol. 45, pp. 1021-1022; July, 1957. 

S. Sherman, ‘“‘Non-mean-square error criteria,’? IRE TRans 
oN INFORMATION THEORY, vol. IT—4, pp. 125-126; September 
1958. 

A. R. Bergen, ‘A non-mean-square-error criterion for th 
synthesis of optimum finite memory sampled-data filters,’ 
1957 IRE Nationat Convention Recorp, pt. 2; pp. 26-32 
Wernikoff, R. E., ““A theory of signals,’ Res. Lab. of Elee 
tronics, Mass. Inst. Tech., Cambridge, Rept. No. 331; January 
1958. 

L. A. Ule, ‘‘A theory of weighted smoothing,’ IRE TRANS. o1 
INFORMATION THEORY, vol. IT-3, pp. 1381-135; June, 1957. — 
F. J. Beutler, ‘Prediction and filtering for random paramete 
systems,” IRE Trans. on INrormaTion Tuxory, vol. [T-4 
pp. 166-171; December, 1958. 

H. C. Hsieh, and C. T. Leondes, “On the optimum synthesi 
of multiple control systems in the Wiener sense,” 1959 IRI 
NATIONAL CoNVENTION RECORD, pt. 4; pp. 18-31. Reprintec 
in IRE Trans. on Automatic ControL, vol. AC-4, pp. 16-29 
November, 1959. : 
P. Swerling, “(Optimum linear estimation for random processe: 
as the limit of estimates based on sampled data,’ 1958 IRE 
WESCON Convention ReEcorp, pt. 4; pp. 158-163. | 


[45 


= 


[46] 


[51 


ena 


! 
| 


; 


a Stationary Gaussian Process* 


PHILLIP BELLO}, assoctrats, RE 


Summary—A narrow-band process is conveniently characterized 
in terms of a complex envelope whose magnitude is the envelope, 
and whose angle is the phase variation of the actual narrow-band 
process. When the narrow-band process is normally distributed, 
the complex envelope has the properties of a complex normally 
distributed process. This paper investigates the approach to the 
complex normally distributed form of the complex envelope of 
the output of a narrow-band filter when the input is wide-band 
non-Gaussian noise of.a certain class, and the bandwidth of the 
narrow-band filter approaches zero. The non-Gaussian input con- 
sists of a train of pulses having identical waveshapes, but random 
amplitudes and phases. While the derivations assume statistical 
independence between pulses, it is shown that the results are valid 
for a certain interesting class of dependent pulses. The Central 
Limit Theorem is proved in the multidimensional case for the 
output process. 


I. IntTRODUCTION 


HERE exist many situations in radar and com- 
Glates problems in which the Central Limit 

Theorem is invoked to support the assumption 
that the output of a narrow-band filter with a wide-band 
non-Gaussian input possesses Gaussian statistics. How- 
ever, little analytical work appears to have been done 
toward justifying this Gaussian assumption. This paper 


* Received by the PGIT, June 21, 1960. 
+ Applied Research Lab., Sylvania Electronic Systems, Waltham, 
Mass. 


investigates the approach to stationary Gaussian statistic 
of the output of a narrow-band filter whose input is | 
sequence of pulses of random amplitude and phase. It i 
assumed that the pulses are of identical shape and occu 
periodically in time at a rate f, per second. 


Il. Tot Narrow-Banp Gaussian PRocEss 


A narrow-band process N(t) centered at fo cps is repre 
sentable in the form 


N@i= RevinGes as (J 


where a property of y(t), defined here as the comple 
envelope of N(t), is that its magnitude is the convention 
envelope of N(¢), while its angle is the conventions 
phase variation of N(¢) about the carrier phase wt. Th 
notation Re {x} denotes the real part of z. 

When N(é) is a stationary Gaussian process, it i 
readily demonstrated that v(t) has the properties of 
stationary complex normally distributed process. Th 
properties of a complex normal process are discussed b 
Doob’ and Arens.” Arens deals with the pre-envelope « 


1 J. L. Doob, “Stochastic Processes,”’ John Wiley and Sons, Inc 
New York, N. Y., pp. 71-78; 1953. 

2 R. Arens, ‘Complex processes for envelopes of normal noise, 
IRE Trans. on InForRMATION THEoRY, vol. IT-3, pp. 204-20; 
September, 1957, 


), which corresponds to v(t)e’’"’’’ in our case. Be- 
se we are here specifically interested in narrow-band 
cesses, 1t 1s convenient to emphasize this by dealing 
the complex envelope rather than with the pre- 
elope of N(t). The typical jointly normal probability 
sity function and characteristic function for complex 
iates are given by Arens.* For our purposes, it will 
be necessary to present the characteristic function 
the N jointly stationary random variables v(¢t;); 7 = 1, 
be NI 


) 


1 ye 
exp = nA — nant, 
p,a=1 


rere the )’s are the characteristic function variables, 
+ asterisk indicates the complex conjugate, and 


x, soe dy) = (2) 


R,(r) = Elp*()oi + 7)] = Ri(—7) (3) 


defined* as the autocorrelation function of v(t). As- 
ing that v(t) has zero mean value, it must (according 
the definition of a complex normal variate) satisfy 
> condition 


Ep(Ov(t + 7] = 0. (4) 
In the following sections, we will examine the complex 
velope z(t) of a filtered random pulse train to determine 
© conditions under which the 2(¢) process may be said 
have properties approaching those indicated in (2)—(4). 


| 
II]. REPRESENTATION OF OUTPUT SIGNAL 


In the subsequent discussion, we will deal entirely 
th complex time functions (complex envelopes and 
e-envelopes) because of the resulting simplification in 
rivations. It is convenient to conceive of the (complex) 
ilse train as being generated at the output of a ‘‘pulse’”’ 
ter by using a random complex area impulse train 
t) as input, where 

i) = Do ve Ot — kD), (5) 
f) is a unit impulse at ¢ = 0, T, = 1/f,, and y, is a 
ndom complex variable. While the subsequent analysis 
Il assume that y, is independent of y;, 7 # k, and 
entically distributed, the results of the analysis are 
tually applicable to a practically meaningful class of 
pendent y’s. Specifically, the analysis is valid for the 
pendent case if y; may be expressed as the output of a 
ime invariant) discrete filter whose input sequence 
tisfies the independence requirement. This fact is seen 
ost clearly in Fig. 1 where the output pre-envelope 
e'’°’ is shown as being obtained by three successive 
tering operations of 7,(t). The first filter has an impulse 
sponse z(t) d(t) where 7(t) is the unit impulse train 


i) = Da — kr), () 


3 Ibid., see (11) and footnote 9. 
4 E[ ] will be used to denote an ensemble average. 


Bello: On the Appreach of a Filtered Pulse Train to a Stationary Gaussian Process 


145 


and d(t) is the pre-envelope of a continuous filter. Thus, 
the output of the first filter is a random complex-area 
impulse train with dependent complex areas. The second 
filter has an impulse response P(t) which is one half’ 
the pre-envelope of a typical normalized pulse. Then the 
output of the second filter is a periodic train of pulses 
of identical shape but having random amplitude and 
phase. The last filter is a narrow-band filter whose im- 
pulse response u,(t)e’*"’°’ is one half the pre-envelope of 
the physical narrow-band filter. Since this filter is 
“centered’”’® at fo, u(t) is one half the complex envelope 
of the filter impulse response. 

Since the complex envelope of a narrow-band process 
centered at fo cps may be found by multiplication of the 
pre-envelope by e ’’*”°’, and since this constitutes a 
spectrum shift of fo eps toward the origin, one may 
quickly verify that the output complex envelope 2(t) 
may be obtained as shown in Fig. 2. 


i, (t} —4i(t)a(t) -— T(t) —9p(t 


Pulse 
Filter 


Discrete 
Filter 


Narrow Band 
Filter 


Fig. 1—Representation of output pre-envelope. 


~j2nf_t jet t -j27mf t 
i (t) : (tiem be Hy (t) z(t) 
r 


itya(tye 


Fig. 2—Representation of output complex envelope. 


For convenience in analysis, one may combine the 
three filters of Fig. 2 into one equivalent filter with im- 
pulse response u(t), where 


ut) = [1 de "| @ (POE 1 On@O, @ 


and the symbol ®) denotes convolution. The conditions 
on convergence of z(t) to a complex normally distributed 
process need then be stated only in terms of the equivalent 
filter. 

In terms of u(t), it may be seen that z(t) is given by 


(ee) 


vie aks 


k=-@ 


a(t) = (8) 


It will be presumed at the outset that u(t) is bounded— 
otherwise z(t) will become infinite periodically. 


IV. AvERAGE REQUIREMENTS 


Two requirements on z(t) which are necessary for it 
to be a stationary complex normally distributed process are 


BG) = Bet et tt) ie) (9) 
and 


Ele(He(t + 7)] = 0. (10) 


5 The factor one half is needed since the pre-envelope of a filter 
output is one half the convolution of the pre-envelope of the input 
with the pre-envelope of the filter impulse response. 

6 ““Centered’’ here means only that some convenient choice for 
fo has been made within the pass band of the filter. 


146 IRE 


These requirements will be called the average requirements. 
It is readily demonstrated that if, for a normally distri- 
buted process, the average requirements are not satified, 
many well-known properties of narrow-band Gaussian 
processes may not be obtained. For instance, the enve- 
lope may not be Rayleigh-distributed, or the phase may 
not be uniform over a 27 interval. This section is con- 
cerned with determining the conditions on the input 
impulse train and (equivalent) narrow-band filter leading 
to satisfaction by 2(t) of the pic requirements. It 
should be noted that (assuming [| 2(¢) |] finite) (9) is 
just the requirement that z(t) be a wide-sense stationary 
complex-valued random process. 
Let the averages 


Ely: Ve | = 1 


(11) 
Ely] se 
be defined. Then it is quickly determined that 
Ble*(He(t + 7)] = u*(t — kTy)u(t + + — kT) 
= R(t, t +7). (12) 


By using 2(t), the unit impulse train [see (6)], one finds 
that 


Rit, t + 7) = ut) © w* ult + 7). (13) 


At this point, a frequency-domain interpretation be- 
comes indispensable. Let the spectrum of u*(t)u(t + 7) 
be defined as 
XG, .= f Out nerd. ag 

It is interesting to note that this spectrum X(f, 7) is 
identical in form to Woodward’s’ ambiguity function for 
a radar pulse u(t). Thus, it has several interesting proper- 
ties. The reader is referred to the literature for these 
properties.’ *° 

The spectrum of the unit impulse train z(t) is the 
frequency-domain impulse train given by 


foe) 


ee aE, 6(f — mf,). (15) 


EG 


Thus, the spectrum of R,(¢, ¢ + 7), defined as P,(f, 7), 
is given by 


Eset LOA 7) 


= al c(mf,, 7) 6(f — mf,). 


(16) 


7P. M. Woodward, ‘‘Probability and Information Theory,”’ 

eta -Hill Book Company, Inc., New York, N. Y.; 1953. 
M. Lerner, ‘Signals with uniform ambiguity functions,” 

1958 TRE NATIONAL CoNVENTION RECORD, pt. 4, pp. 27-36. 

9W. M. Siebert, “A radar detection philosophy,” IRE: Trans. 
ON INFORMATION THEORY, vol. IT—2, pp. 204-221; September, 1956. 

10 W. M. Siebert, ‘Studies of Woodward Uncertainty Function,” 
Res. Lab. of Electronics, Mass. Inst. Tech., Cambridge, Quart. 
Prog. Rept.; April 15, 1958. 


TRANSACTIONS ON 


INFORMATION THEORY SY 


Reverting back to the time domain, 


Rb t+) = we XO, + eX, 2 


My etl gt hea ce or: 


i 
Ts 


“soe Xfce ae (17; 
Examination of ne shows that R.(t, t + 7) is periodi 
in ¢ with a period T,. In fact, (17) is just its Fourie 
series expansion. Thus, for R,(t, ¢ + 7) to be time inde 
pendent, the fundamental and all harmonics must vanish 
: 

BOY : 


| XGnfi2t) | =O for erent (18 


yielding 


Fleet + D) sel u*(éu(t + 2) dt 


=R() =R*(->. Ce 


Let us assume that the spectrum of u(t) is limited to 
a band B eps wide, centered at zero frequency. Then it 
is quickly determined that X(f, 7) cannot have a spectrum 
extending beyond a band 2B eps wide, centered at Zer0 
frequency. Let B_ and B, denote the lower and upper 
limit, respectively, of this band. Then it is quickly de- 
termined that (18) and thus, (19) will be satisfied if 


Max {| B_ |, B.} < hi. (20) 
Fig. 3 shows I[(f) and | X(f, 7) | when (20) is satisfied: 


lati 


FREQUENCY, { 


Fig. 3—Relation between filter pass band and pulse-train frequency 
for wide-sense stationary of 2(t). 


Since a physical impulse response cannot be strictly 
band limited, it is clear that R(t, t + 7) can never be 
exactly time independent. However, it is also clear in 
the practical situation that its time dependence will be 
negligible if the “bandwidth” of y(t) (defined in an 
appropriate sense) is sufficiently small. 

Consider now the other average requirement, 


Eleett + 7)] 


Ms: 
8 


e PhP ut — kT) u(t + + — kT) 


= BR(t, t+ 7). (21) 


Except for the special case 8 = 0, this expectation wil 
not vanish unless the summation vanishes. This sum: 


1 Bello: On the Approach of a Filtered Pulse Train to a Stationary Gaussian Process 


tion may be represented in the time domain as 


R(t, t+ 7) = ie" @ wOultt+ 7), (22) 
in the frequency domain as 
A = fe eheto) a), (23) 


re P,(f, 7) is the spectrum of #,(t, t + 7), and X(f, 7) 
he spectrum of u(f)u(t + 7), 


co 


KG.) =f wduet+ nerd. es 
verting to the time domain, 
Ri, b+) = DY Kmh, — aero 
—j4rfot 1 = ~ ae 
aoe T. Ds EX nh, Seah er (25) 
1 m=-0 


re function &.(t, t + 7) is the product of the periodic 
nection exp (—j4qfot) of period 1/2fp by a summation 
ich is periodic with period 7,. The Fourier coefficients 
this sum are just X(mf, — 2f,). Thus, in order for 
((t, ¢ + 7) to vanish (for satisfaction of the second 
erage requirement), it must be that 


X(mf; — 2f.) = 0 forall m. (26) 


u(t) is band limited, then it may be shown that X Gin) 
anot have a spectrum extending beyond the same 
ange as X(f, 7). Using the same definitions for B_ and B, 
for (20), it may readily be seen that (26) and thus 
0) will be satisfied if the following two equations are 


tisfied: 


Secor (27) 
Nese eee pel 
nere 
fe hor (28) 
tie Ma eee Ii, 
which 
Mo = Max 12), — mj.> 0}. (29) 


g. 4 shows I(f — 2f,) and | X(f, r) | when (29) is satis- 
d. In the general case, 2t should be noted that if either 
or f— fall very close to zero frequency, then the second 
erage condition (10) will not be satisfied unless the filter 
ndwidth is very small. If the transfer function of the 
uivalent filter is symmetrical and the gain.drops to 
ro monotonically at the band edges it is clear that, 
far as making R(t, t + 7) small is concerned, a desirable 


ndition exists when f, = | f_ | = f,/2. It may also be 
monstrated that when the filter transfer function is 
mmetrical, when f, = | f- | = f:/2, and when the y,’s 


e real, z.e., purely amplitude modulated pulses, then 
e real and imaginary parts of z(t) become statistically 
dependent processes. 


147 


FREQUENCY, f 


Fig. 4—Relation between filter pass band and pulse-train frequency . 
for satisfaction of second average requirement. 


To summarize this section briefly, simple frequency- 
domain inequalities (20) and (27) are presented which 
determine when the average requirements are satisfied 
by the complex envelope of the filtered pulse train for 
the case in which the equivalent narrow-band filter 
strictly band limits. When the narrow-band filter does 
not strictly band limit, the average requirements cannot 
be exactly satisfied. However, except in one singular case, 
they may be satisfied to any degree of precision if the 
“bandwidth” (suitably defined) of the narrow-band filter 
is sufficiently small. This singular case occurs when the 
center frequency of the narrow-band filter is an integral 
multiple of half the repetition frequency of the input 
pulse train. In this case, the second average requirement 
(10) can never be satisfied unless either 


Ely] = 6 = 0, 
or (30) 


fo) 


ib NOP Pa eetO 


which are rather special situations. 

In the non band-limited case, one may still define fre- 
quency limits B, and B_ beyond which the spectra 
X(f, 7) and X(f, r) are negligibly small. Then inequalities 
of (20) and (27) are useful in determining whether the 
average requirements may be regarded as being satisfied. 


V. Centra Limit THEOREM 


In addition to satisfying the average requirements, 
the joint characteristic function of 2(¢,), 2(t2), --- 2(ty) 
must be of the form of (2) if 2(f) is to be a complex 
normally distributed stationary process. This section will 
demonstrate that the following two conditions are suffici- 
ent for the convergence of the joint characteristic function 
of z(t), 2(ts), -- + 2(ty) to the stationary complex normally 
distributed form [provided the second average require- 
ment of (10) is satisfied and u(t) is bounded]: 


E[| y |] < @, 


[ lO Par<e. (31) 
It will be convenient to standardize the complex variates 


2, = 2(t,) to unit average-squared magnitude. Now 


148 IRE TRANSACTIONS ON 
yr} | 2 l he ] We ‘ 
Ella ll=a | le Pa (32) 
1 Jd—o 
(assuming the first average requirement is_ satisfied). 


Thus, our standardization will be possible only if 


| | u(d) |? dé 


is finite. Assuming this to be the case, standardization of 
2, will be effected by normalizing u(t), so that 


il Re » 
T, i | wu) [ dt = 


The subsequent derivations can be made considerably 
more compact by dealing with suitably defined density 
functions and characteristic functions of multidimensional 
complex variates. The density function and characteristic 
function of a one-dimensional complex variate will now be 
defined and the generalization to N-dimensional complex 
vectors will be clear to the reader. Let v and wu be the real 
and imaginary parts of a complex variate w given by 


(34) 


(33) 


w=v+ yp, 


and let P(v, uw) be the joint density function of v and wu. 
Then the density function of w, P,(w) will be defined as 


Pw), = P {Re tw}, Imi {fw} ]: (35) 


An average of some function of w, g(w), with respect to 
P,(w) is to be interpreted as 
| g(w)P,(w) dw = i / He GUO a nnn) 


If the joint characteristic function F(é, 
ables u, v is defined as 


n) of the vari- 


FE, 1) = ff Pew, vel? du do, (37) 
then the characteristic function of w is defined as 
P,Q) = [ Piel" dw = FReQ), Im Q)], G8) 
where the complex characteristic function variable 
A=E+ jn. (39) 


An overline will be used to indicate a vector, or a 
collection of variables. Thus, the symbol w# may be used 
to denote the set of complex variables (w,, We, ++: Wy). 
An average of some function of w, g(w), is to be interpreted 


| NEO E 


=/f fl gli: + is te + io, he 


Un, Vy) du, dv, --- 


Un jon) 


OE Ghig antlan Ong duy dvy (40) 


where P,(#) is the density function of an N-dimensional 
complex variate w, and P(w,, 01, «++ Un, Vy) 1s the joint 


INFORMATION THEORY July 


density function of the real and imaginary parts of the 

components of #. The characteristic function of W is 
expressed as 

Fi) = i Pi@e™™ aa, (41) 

where X is a complex vector (Ay, Ax, -:: Aw), and w* 

d is the inner or dot product of the vectors w* and X 

The complex random variable z, is given by 


-> V mitt ty - MAC eS = oS: emk) (42) 


where the random variable 


(43) 


—j2mfomT1 


emk = mb (ty ary mT ,)e 


If the vector Z denotes the set of N complex variates 
(21, 22, +++ Zy), then it is representable as a sum of inde= 
pendent complex vectors as follows: 


a4) 


where the vector 2,, denotes the set of N complex variates 
(Caijsemoy oe een) 

If the probability density function of the y, is denoted 
by W,, then the probability density function of Z,, 
W,,.(Zm), 18 readily demonstrated to be given by 


N 
W (Em) = i ri Tea W le =| I] i| em 


where the coefficient 
Can = lb, — mT jen, 


and 6(x) is a unit impulse at x = 0. 
From (45) and (41), the characteristic function of Z,, 
is found to be simply 


F(X) = F,[C#-X], 


oes, 
— Emi rail | , (45) 


(46 : 


(47) 
where F, is the common characteristic function of the 
yx, and the coefficient vector C,,, denotes the set of coeffici- 
ents (Cini, Crna, °*+- Cuv). The vector \ denotes the set of N 
complex characteristic function variables (A, \2, «++ Ay). 
Inasmuch as Z is represented as a sum of independent 
random variables, its characteristic function F(X) is 
given by the product of the characteristic functions of the 
component random variables and the logarithm of F(X) 
by the sum 


Dy log FCs 


log F(A) = Wap (48) 
It is readily demonstrated that if | 
Bon Elia | de (49) 

then F',(A) has the finite series expansion 
F,Q) = 1 — [| | + Re (076"}] + 398 |2|*, (60) 


where J is a complex quantity of modulus not exceeding 
unity. Thus, 


1 Bello: On the Approach of a Filtered Pulse Train to a Stationary Gaussian Process 149 
[| Cz-X |? + Re {p*(C#-4)7}] If we define 
+ $98, | CZ-2]?. (61) f= t— Tr, (60) 


e are interested in the behavior of F’,,(X) as the filter 
dwidth (defined in an appropriate sense) approaches 
. To study this behavior, let u(t) be expressed in the 


—— eh 
ane) 

a 
rape 


then the summation over m becomes 


co 


MS | Ora MOIS Ola, | 


m=—co 


u(t) = VB s(Bd, (52) =4(DS)| uh — 7) = 7b = 7 ao) 


ere s(t) has unit bandwidth and u(t) has bandwidth B. 
change in bandwidth of u(t) is then a scale change in 
time domain (and also in the frequency domain). 
factor 1/B is needed to maintain the normalization 
(33). With the aid of (52), (51) may be represented as | ee a ere ra 


where i(t) is the unit impulse train of (6). By using a 
frequency domain interpretation, it is seen that if we 
assume that the spectrum of 


F(X) = 1 — Bg, + BP’ gs, (53) is confined to frequencies within the region —f, < f < fi, 
Pe vihe finchons then the above sum becomes 
g: = 4[| DE-X |? + Re {8*(D*-)’}], 


J2 = 6983 | DE-X |’ 


{oo} 


Gay DIC CuO 


ma=—o 


1 (ee) 


‘e normalized coefficient vector D,, is given by Slit | w(t — 7,)u(t — 7,)u(t — 7,) | dt. (62) 
1 /—o@ 
an re ses : é Alas 
| D, = VB Cae (55) However, it is readily shown by a simple application of 


| the Hélder inequality for integrals’’ that 
bie that as B — 0, D,, > w(O)e?"""™ UO where Cis . , i 
rector with unit values for coordinates. Thus for fixed / Leet ra dice | | u(Z) |’ dé 

both Bg, and B*” g, approach zero as B approaches erie se 

*o. It may then be shown that for sufficiently small B, ee : 

= may represent log F,,(A) by the finite series =e ik 6 | s(f) | dé. (63) 


F(X) = —3[| C2-X |? + Re (8*(C%-)7}] 
44958, |CH-A|°, (56) 


—_— \3 / 
ere J, is a complex quantity of modulus not exceeding VB a Pl | 82) | a> | 7 (64) 
ity. It follows that for sufficiently small B, 


Thus, the last sum in (57) is bounded by 


Since s(t) is bounded and of integrable squared magnitude, 


i : | s(t) |° dé 
> HCE (57) 


me is finite. Thus for fixed X, the last sum approaches zero 
as ~/B. It follows that 


FG) = -3 Doe Gi ey Re < S (Ci. eu, 


m= 


If the average requirements are met, 


= lim log F(A) = —} Lota — GN Ae 65 
DIRK = 2 dat Fe Cha in Jom ciao ae 
| = = or equivalently 
= OE INS EN OD OY i 
p,a=l1 m=—o N 
N © lim F(X) = exp \-3 >» RAt, — tonath, (66) 
aa pa NGG > CoCr B>0 p,a=l1 
oa ; lt which is the normal characteristic function. 
a SS Gag Oe (58) The continuity theorem for characteristic functions” 
hs ae : may be applied to show that the joint distribution func- 


tion of 2(t,), z(t), --: 2(tv) converges to the normal 


yw the last sum in (57) is bounded as shown in (59): Won Of 4 
distribution function. 


) 


hs | Cx : r is 
r uG. H. Hardy, J. E. Littlewood, and G. Polya, “Inequalities,” 
AY © Cambridge University Press, Cambridge, Eng., p. 140; 1959. 
at ‘S | ApAdA SS FG Ried Ono (59) 2 H. Cramer, “Mathematical Methods of Statistics,” Princeton 
a pi qr m m mr . 


University Press, Princeton, N. J.; 1946, 


Din@ivia m=— 0 


150 


The Axis Crossings of a Stationary 
Gaussian Markov Process* 


J. A. McFADDEN} | 


Summary—In a stationary Gaussian Markov process (or Ornstein- 
Uhlenbeck process) the expected number of axis crossings per 
unit time, the probability density of the lengths of axis-crossing 
intervals, and the probability of recurrence at zero level do not 
exist as ordinarily defined. In this paper new definitions are pre- 
sented and some asymptotic formulas are derived. Certain renewal 
equations are approximately satisfied, thereby suggesting an 
asymptotic approach to independence of the lengths of successive 
axis-crossing intervals. Mention is made of an application to the 
filter-clip-filter problem. 


INTRODUCTION 


N two previous papers'’” the author has described 
a theory of the axis-crossing intervals of a stationary 
random process &(¢). Following the work of Rice,” 
relations were given between the following quantities: 


1) 8, the expected number of axis crossings per unit 
time; 

2) P(r), the probability density of the lengths of 
intervals between successive axis crossings; 

3) U(r) dr, the probability of a crossing in (tf + 7, 
t + 7 + dr), given a crossing in (¢ — df, t); 

4) r(7), the autocorrelation function of the given 
process after infinite clipping; 

5) wl, 72, 73), the fourth product moment of the 
process after infinite clipping. 


Suppose, however, that é(f) is a stationary Gaussian 
Markov process, 7.e., an Ornstein-Uhlenbeck process, or 
the output of an RC low-pass filter when the input is 
stationary, white Gaussian noise. In this case the previous 
theory breaks down, since 8, Po(r), and U(r) do not 
exist as defined above. It is the purpose of this paper to 
extend the theory to include the stationary Gaussian 
Markov process and to examine some of the consequences. 


PROBABILITY OF ONE OR MORE CROSSINGS 


As was shown by Rice,’ the expected number of crossings 
per unit time is infinite for a stationary Gaussian Markov 


* Received by the PGIT, August 2, 1960. This work was sup~ 
ported by the Office of Nav. Res. under Contract Nonr—1100(15). 
+ School of Elec. Engrg., Purdue University, Lafayette, Ind. 

1 J. A. McFadden, “The axis-crossing intervals of random func- 
tions,” IRE Trans. on INForMATION THEORY, vol. IT-2, pp. 
146-150; December, 1956. 

2 J. A. McFadden, ‘‘The axis-crossing intervals of random func- 
tions II,” IRE Trans. on INFoRMATION THEORY, vol. IT-4, pp. 
14-24; March, 1958. 

38. O. Rice, “Mathematical analysis of random noise,’’ Bell 
Sys. Tech. J., vol. 23, pp. 282-332, July, 1944; vol. 24, pp. 46-156; 
January, 1945. See esp. sect. 3.3 and 3.4. 

4 Rice, op. cit., sect. 3.3. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Jul 


| 
process. For this reason, the previous definition of B 
must be generalized. : 

Let B(A)A be the probability that one or more crossings 
occur in the finite interval (¢ — A, t). [As A — 0, B(A), 
becomes the constant 6, as previously defined, for processes 
in which the limit exists. ] | 

Now suppose that &(¢) is a stationary Gaussian Markov 
process. The mean value of é(¢) is assumed to be zero, 
The autocorrelation function of &(t) is exponential,” and 
it is convenient to set the time constant equal to unity, 
Thus the normalized autocorrelation function is as follows 


1) | 


Tae a) 

For such a process, the probability p(0, 7) that a 
given interval of length 7 contains no crossings is the 
function, 


p(0, 7) = 2sin (€"). (2) 


This result was derived by Siegert’ and by Slepian.” 
Thus, for this process, the quantity B(A)A is given by 
the relation 


BA) A = 1 — 


cS a AS) 


sina (es. 


(3) 


If A is small [7.e., compared to unity, the time constant 
in (1)|, then asymptotically, 


2/2 : 
2 V2 Ae OA 


Tv 


B(A) = (4) 


As was stated previously, 8(A) does not remain finite as 
A — 0. The nature of the singularity is apparent from (4), 
In a previous paper, it was shown that 8 is proportional 
to the initial slope of r(7), the autocorrelation function of 
&(t) after infinite clipping. That derivation cannot be 
generalized under the present definition of B(A). 


® J. L. Doob, “The Brownian movement and stochastic equa- 
tions,” Ann. Math., vol. 43, pp. 351-369 (1.1.6); April, 1942. - 

6 A. J. F. Siegert, “On the Roots of Markoffian Random Func- 
tions,’ RAND Corp., Santa Monica, Calif., Rept. No. RM—447: 
September, 1950. 

7§ Rice, “Distribution of the duration of fades in radic 
Wein or. Bell Sys. Tech. J., vol. 37, pp. 581-635(114); May, 
1 ; 

8 McFadden, op. cit., “The axis-crossing intervals of random 
functions,” (12). 


PROBABILITY DENSITY OF THE LENGTH OF AN 
AxXIs-CROSSING INTERVAL 


‘As was shown by Kohlenberg’ and by the author,”° 
probability density P,(r) of the length of an interval 
ween successive axis crossings in a stationary process 
ordinarily equal to p’’(0, r)/8. It is not surprising that 
h a formula fails in the Gaussian Markov case. The 
finition of P(r) must therefore be revised. 
Consider those sample functions £(t) for which one or 
re crossings have occurred in (¢ — A, #). Let T be 
andom variable such that the next crossing after time ¢ 
curs at time t + JT. Then P,(r) dr is defined as the 
obability that T lies in the range between 7 and 7 + dr. 
f. the “horizontal window condition”? of Kac and 
epian.”” As A — 0, P(r) becomes the density P(r), 
previously defined, for processes in which the limit 
Wis] 
P,(r) will now be expressed in terms of 6(A) and 
((0, 7). Consider the following events £,, E., and EH: 


£,: No crossings occur in the interval (t, ¢ + 7). 

E,: No crossings occur in the interval (f — A, ¢ + 7). 

E;: One or more crossings occur in (¢ — A, ft), but 
none in (¢, ¢ -+ 7). 


\ 
f 


Then the probabilities of these events are related as 
ollows: 


| PE PE ahs (5) 
j 
0) =70,7 +a +eHaf PDA © 


| 
‘he last term of (6) is equivalent to P{H;} because KE; 
s the event that one or more crossings occur in (¢ — A, t) 
nd that the next crossing occurs after time ¢ + +r. 

After differentiation with respect to 7, (6) yields the 
yllowing solution for P4(7): 


_ 20,7 +d) - 70,7) 
es Bata) | 


[In those cases in which the limit exists, (7) becomes the 
revious expression p’’(0, r)/6 as A — 0.] 

For a stationary Gaussian Markov process, by (2) 
nd (3), 


(7) 


—T aN ie —(T+A) — pve (T+A) =1/2 
(2) = e (l—e”) - e [l—e | (8) 
a sim (6 -) 


9 A. Kohlenberg, ‘‘Notes on the Zero Distribution of Gaussian 
Joise,’? M. I. T. Lincoln Lab., Lexington, Mass., Tech. Memo. 44, 
. 4; October, 1953. 

10 McFadden, op. cit., “The axis-crossing intervals of random 
inctions II,” (4). 

11M. Kae and D. Slepian, ‘“‘Large excursions of Gaussian proc- 
sses,” Ann. Math. Stat., vol. 30, pp. 1215-1228; December, 1959. 

122 McFadden, op. cit., ‘The axis-crossing intervals of random 
inctions II,’’ Appendix I. 


61 McFadden: The Axis Crossings of a Stationary Gaussian Markov Process 


151 


As7— 0, P4(r) behaves like 7”. An asymptotic formula 
for P,a(r) cannot easily be given (for small A) which 
remains valid as r > 0. 

The Laplace transform of P(r) is more manageable. 
By the use of tables,“ 


oo 


i SOD aie 


s i 
als +45) 


x — 2sin ' (e *) 


I 


Pals) 


oA 1 ae | 
" é fone AP 2 ) a (9) 


where 


[ aoe 
Bip, @ . 


The following asymptotic expansion for pa(s) exists for 
small A: 


Ip, Q) = (10) 


‘ a(S aly 4) 
pas) = 1 — 2 = a 
A(s) = as 
oA 2 
As A — O this becomes the transform of a 6 function. 


Asymptotic expressions for the first and second moments 
of T follow from (11). 


a OAL) aaa, 


T 


E(?) = = AY’? + ((A). 12 
as ee o i 
en T log 2 1/2 3/2 

E(T’) = AN CAN) (13) 


YD 
The ratio of the variance D*(T) to the mean E(T) 
has a finite limit, since 


DT) 
ET) 


= 2log2 + 0(A”). (14) 
In the previous theory," E(T) = 1/6. A similar 
relation holds here too, asymptotically, for by (4) and (12), 


B(A)E(T) = 1 + 0(A””). (15) 


PROBABILITY OF RECURRENCE 


For a general stationary Gaussian process, the prob- 
ability U(r) dr of a crossing in (¢ + 7, ¢ + 7 + dr), given 
a crossing in (t — dt, t), was derived by Rice.’* If, how- 
ever, &(t) is a Markov process, then Rice’s derivation is 
not applicable, since é’(¢) has an infinite variance. 

In a previous paper by the author,’ U(r) was related 
to the fourth product moment after infinite clipping. 


13 Bateman Manuscript Project, ‘Tables of Integral Trans- 
forms,’ McGraw-Hill Book Co., Inc., New York, N. Y., vol. 1, 
p. 261(1); p. 129(4), (5); 1954. 

14 Rice, op. cit., (3.410). 

15 McFadden, op. cit., ‘“The axis-crossing intervals of random 
functions II,’ Appendix IT. 


IRE TRANSACTIONS ON 


ay, 


That derivation has been extended below. Let U4;5(7)6 
be the conditional probability that one or more crossings 
occur in (¢ + 7,f + 7+ 4), given that one or more crossings 
have occurred in (¢ — A, t). Then 6(A)U4;(7)A6 is the 
joimt probability of one or more crossings in (t — A, ¢) 
and one or more crossings in (t + 7,¢-+ 7+ 6). [AsA—0 
and 6 — dt, U,;(t) becomes U(7), as previously defined, 
for processes in which the limit exists.] 

The number of crossings in (tf — A, t) and the number 
in (t + 7, ¢ + 7 + 6) ean be either odd or even. The four 
events ‘‘odd-odd,” ‘‘odd-even,” “‘even-odd”’ and “even- 
even’ have nearly the same probability. Because of the 
clustering of axis crossings in a stationary Gaussian 
Markov process, the difference between any two of these 
probabilities is of higher order in A or 6; in other words, 
only a minute change in the magnitude of 6 or A is neces- 
sary to change the number of crossings by one. Thus 
B(A)U43(7) 46 is equal to four times the probability that 
a net sign change occurs in (¢ — A, ¢) and another in 
(t+ 7, + 7 + 6), plus higher-order terms in A and 6. 

Let P_.,- be the probability that £¢@ — A) < 0, 
é(t) > 0, E(t + 7) > O, and &(t + 7 + 6) < O, and corre- 
spondingly for other combinations of signs. Then for 
small A and 6, 


B(A) U43(7) Aé ~ ACP ees aie Y gee Rear, SIP Vue: ae V gitcrel) 
(16) 
or by symmetry, 


BAC asa) AOS SC Pa): (17) 

In another paper by the author,’® the various proba- 
bilities P_.,-, ete., have been expressed in terms of 
moments of the process é(t) after infinite clipping. Let 


f= > 0: 
x(t) +1 when £7?) > 0; (18) 

= —] when &¢t) < 0. 
Let. = Ale or Oe ee — er (eer eA 
a, = v(t + 7 + 6). Furthermore, let the correlation 


coefficients after clipping be r;; = E(«,a;), and let the 
fourth product moment be w E(2,%2%3%,). Then*™® 


ue 


P= 16 (1 2 (irerume (ie ae (ay ear Wise —— die =a Se w]; 
1 
yet 16 [1 Si LEAS MOS —alhice —rp Smliey. TEV ois w; 
(19) 
and (17) becomes 
B(d) U 43(7) Aé Sot 1 old: + W. (20) 


16 J. A. McFadden, “Urn models of correlation and a com- 
parison with the multivariate normal integral,” Ann. Math. Stat., 
vol. 26, pp. 478-489; September, 1955. See sect. 6. 


INFORMATION THEORY 
Since &(¢t) is Gaussian, 7;; om 
formula,’ 


+ 4 
rj = —si” pis, 
Tv 


where p,; is the correlation coefficient before clipping. 
series for the fourth product moment of a stationary 
Gaussian Markov process, after clipping, has been derived 
by McFadden.”* The result is 


4 — a 2 \1/2 2 \1/2m 
WS = ‘sin ; Pi2 SID : Psa + Pi2P3a(1 am rah dd 7 p34) 


2m 


P23 
m! 


(= 2) m= 3) m 
(2) m 


FUG er slo we me 05 oh 


where (a),, = T'(a + m)/T(a). In (21) and (22), 


FO = 70,15 aes) 


m=1 


(22) 


bea ay 


(Diss SG s (Oxy Gs ne = ae (23) 


Now for small A and 6, the aresines may be expanded | 
the arguments of the hypergeometric functions may be- 


replaced by unity, the errors being of higher order. Thus | 
for Tie ade 


(1 — pie)'?FU — m, 1; 3 — m; pr) 


2A) FU = m, 1,2 = 7031) + 0) 


l| 


= (2 A)'’72(—34 + m) + 0(A*”). (24) 
Then, since | 
SS Ge 2 (1 = 2 eee. (25 | 
Sm! (= P23 ) ) 
(22) gives the result, 
i 
Tv TT 
Si 8 A? Beg) ss gone 
ae 
+ higher-order terms. (26) 


Finally (21) may be expanded and substituted into 
(20), along with (26). The first few terms cancel and (4) 
may be used; then 


Uasl(t) ~ BCL — 77)”. (27) 


Hence for a small but finite value of 6, U,;(0) is infinite 
but U4,;(7) decreases monotonically as 7 increases, ap- 


17 J. L. Lawson and G. E. Uhlenbeck, ‘‘Threshold Signals,’ 
McGraw-Hill Book Co., Inc., New York, N. Y., p. 57; 1950. 

18 J. A. McFadden, ‘Two expansions for the quadrivariate 
normal integral,’’ Biometrika, vol. 47, pp. 325-333; December, 1960. 

19 Bateman Manuscript Project, “Higher Transcendental Fune- 
tions,’’ McGraw-Hill Book Co., Inc., New York, N. Y., vol. 1, p. 61, 
(14); 1953. This formula would ordinarily be valid only when 
c¢ > a + b, but, since this series terminates, the restriction is 
unnecessary. 


roaching the value 8(6). This limit is correct since 
6)6 is the probability of one or more crossings in an 
terval of length 6; the initial condition has vanishing 
fluence as r > o. The leading term (27) does not 
ntain A. 


RENEWAL EQUATIONS 


If it is assumed that the lengths of successive axis- 
rossing intervals are statistically independent (7.e., they 
(a a renewal process), then certain well-known relations 

ist’ between the Laplace transforms of P,(r), U(r) 
nd r(r). If u(s) and f(s) are the Laplace transforms of 
(7) and r(r), respectively, then it is easy to show that 


= (28) 
md 
(ian pa (29) 


On the other hand, it was shown by Palmer” and 
MicFadden*’” that for stationary Gaussian processes in 
vhich the autocorrelation function p(r) possesses certain 
‘erivatives at the origin, the lengths of successive axis- 
rossing intervals cannot be independent. The Markov 
ase, being somewhat unique, must be investigated 
eparately. 

The Laplace transform 
tically,” 


of U,s(r) in (27) is, asymp- 


Bins Sele 
Was(8) AE 6 r( : a : (30) 
| Tees by (LL). 
Pals) 2 An? r(3) 0). GD 


ips) ON 


Thus, asymptotically for small 6, or A, a renewal equation 
similar to (28) is satisfied between p;(s) and wa;(s), or 
yetween pa(s) and wusa(s). 


Uas(S) ~ en (32) 


Consider next the autocorrelation function after clip- 
ping. By (1) and (21), 


(33) 


20 McFadden, op. cit., ““The axis-crossing intervals of random 
unctions II,’”’ (30) and (31), and the references cited therein. 

27). S. Palmer, ‘Properties of random functions,” Proc. Cam- 
ridge Phil. Soc., vol. 52, pp. 672-686; October, 1956. 

2 J. A. McFadden, ‘‘The fourth product moment of infinitely 
lipped noise,’ IRE Trans. on INrormation Tueory, vol. IT-4, 
yp. 159-162; December, 1958. 


61 McFadden: The Axis Crossings of a Stationary Gaussian Markov Process 153 


It follows from integral tables’ that the Laplace trans- 
form is 


1 1 iS il 
oy = 4- LoS +55). 2) 
On the other hand, by (4) and (11), 


26(A) 1 — pals) ph 1 
s 1+ p,(s) as 


(5 +3 3) + 0(a'”). (35) 


Thus, asymptotically for small A, a renewal equation 
similar to (29) is satisfied, where 8 has been replaced by 
B(A) and po(s) by pa(s). 


aaa 26(A) 1 — pals) 1/2 
f(s) = s = 3° 1 aE pals) Ss O(A Ne 


APPLICATION TO THE FILTER-CLiIe-FILTER PROBLEM 


(36) 


The above theory of axis crossings has an application 
in the following problem. Let &(¢) be a stationary Gaussian 
Markov process, as defined previously, 7.e., the output 
of an RC low-pass filter (with RC = 1), when the input 
is stationary, white Gaussian noise. x(t) is the output 
after &(t) is infinitely clipped, as in (18). Now let x(t) be 
the input to a second RC low-pass filter with RC = T, 
and let y(t) be the final output. It is desired to find 
moments or, if possible, the distribution of y(t). This may 
be called the ‘‘filter-clip-filter’” problem. 

Previously McFadden™* has studied the distribution 
of the output of an RC filter when the input is a stationary 
bimary random process. The lengths of the axis-crossing 
ion of the input were assumed to be statistically 


é independent and identically distributed. 


If this method is applied to the filter-clip-filter problem, 
two difficulties arise: The first is the questionability of 
the assumption of the independence of the lengths of 
successive axis-crossing intervals. The second is the fact 
that po(s) = 1, causing some of the formulas to become 
indeterminate. 

Nevertheless, proceeding formally with p(s) from (11), 
in place of po(s), expressions for E[y"(t)| and E[y*(t)] 
have been obtained. When A — 0, these expressions agree 
with those obtained by other methods. 

Complete results of the filter-clip-filter investigation 
will be published at a later date. 


CONCLUSIONS 


Although 8, Po(7), and U(r) do not exist (as previously 
defined) for a stationary Gaussian Markov process, 
asymptotic formulas for analogous quantities have been 
obtained. Even if strict independence of the lengths of 
successive axis-crossing intervals is not easily defined, 
the results suggest a type of asymptotic independence. 


23.W. Grobner and N. Hofreiter, “Integraltafel,” Springer- 
Verlag, Vienna, Austria, vol. 2, p. 152, (5a); 1958. 

4 J. A. McFadden, ‘‘The probability density of the output of 
an RC filter when the input is a binary random process,’? IRE 
Trans. oN INFoRMATION THuORY, vol. IT—-5, pp. 174-178; Decem- 
ber, 1959. 


154 


On Optimal Diversity Reception* 


GEORGE L. T 


Summary—The ideal probability-computing M-ary receiver is 
derived for a fading, noisy, multidiversity channel, in which the 
link fadings may be mutually correlated, as may the link noises. 
The results are interpreted in terms of block diagrams involving 
various filtering operations. Two special cases, those of very fast 
and very slow fading, are considered in detail. 


Il. INTRODUCTION 


\ ) YE shall examine in this paper the following 
hypothesis-testing problem, which is depicted in 
Fig. 1. One of a set of 7 waveforms, here denoted 
by their complex analytic representations,’ £;(f)(¢ = 1, 
2, --- , M), is transmitted into a channel which comprises 
L diversity links, 7.e., L ways by which the transmitted 
waveform can reach the receiver. These links are not 
deterministic, however; each is perturbed by two time- 
varying random disturbances, one multiplicative in 
nature and the other additive. Denoting the two dis- 
turbances in the /th of the ZL links by the complex analytic 
representations y;(¢) and v;(t), respectively, we accordingly 
write the output of the /th link as 


6(t) =< Vi HEn(t) ate v(t), (1) 


where we have assumed that £é,,(t) was transmitted. 
(Note that | y:(é) | and tan * [Im y,(t)/Re y,(t)] represent, 
respectively, the random amplitude and phase modula- 
tions—.e., fading—suffered by the transmitted signal on 
traversing the /th link.') The y,(t)’s may be correlated 
amongst themselves, as may be the »y,(t)’s; we shall 
assume, however, that the fadings and additive noises 
are statistically independent. 

The receiver at the output of the channel has available 
L inputs of the form of (1), but does not know the value 
of the index m. It is called upon to guess the true value 
of m on the basis of its observations of these inputs. This 
guess is the receiver’s output. 

Clearly, the set of received waveforms, {f£,(¢)}, may 
be written as a vector (column matrix) Z(t), the Ith 
component of which is ¢,(t). If we similarly write the 
sets of functions {y,(t)} and {»,(t)} as the stochastic 


* Received by the PGIT, August 24, 1960. 

+ Hughes Research Labs., A Division of Hughes Aircraft Co., 
Malibu, Calif. 

1 Re £(t) is the actual 7th physical waveform, and Imé,(t) is 
defined as the Hilbert transform of Reé;(t) [see (27) and (28) for an 
example of Hilbert transform relations]. For narrow-band wave- 
forms, we may approximately identify |,(¢)| with the envelope, 
and tan? [Imé; (i) Res (0) with the phase, of the 7th physical 
waveform. Cf., P. Woodward, ‘‘Probability and Information 
Theory, with eer to Radar,” McGraw-Hill Book Co., 
Ine., New York, N. Y.; 1953. 


IRE TRANSACTIONS ON INFORMATION THEORY 


TURIN, SENIOR MEMBER, IRE 


vectors I(t) and N(t), we may write Z(t) as 
Z(t) = £,.()V(t) + N(). (2) 


We may then define the task of the receiver as that of 
transforming the stochastic vector Z(t) into a scalar 
which may assume any one of M values. 
Physical examples of communication channels of this 
type are numerous. Perhaps the most obvious is that of 
space diversity, where ¢,(¢) represents the signal received 
by way of the Ith of L antennas. Again, with the mathe- 
matically unimportant insertion of known frequency 
shifts (which may be included for convenience in the 
y,(t)), the model may be made to correspond to a fre- 
quency-diversity situation in which ¢,(¢) represents the 
signal received over the /th of ZL nonoverlapping frequency 
bands. The model may also depict the time diversity of 
a resolvable multipath situation in which ¢,(¢) represents 
the signal received via the Ith of L resolvable (7.e., sepa= 
rable) paths of known modulation delay.” 
In order to solve the problem just defined, we require 
certain well-established mathematical results, which are 
summarized in the following section. 


OUTPUT 


RECEIVER 


TRANSMITTER CHANNEL RECEIVER 


Fig. 1—The system under consideration. 


I]. MATHEMATICAL PRELIMINARIES 


A. Representation of Vector Stochastic Processes 


Let X(t) be a finite-dimensional vector stochastic process 
with complex components, x;(t), all of zero mean, and let 
the covariance-function matrix’ K(s, t) = E[X(s)X’*(0)] 
be such that all of its components E[x,(s)x*(t)] exist and 
are continuous on some finite square [a <s < b,a<t< }]. 


2G. L. Turin, “Communication through noisy, random-multi- 
path channels,” 1956 IRE Convention ReEcorp, pt. 4, pp. 154-166. 

3’ A prime denotes “transpose”; an asterisk denotes ‘complex 
conjugate.” 


en the following statements hold;*’ most of these 
7 immediate extensions of their one-dimensional equiv- 
mts.° 

iL) The representation 


k 
Fperees uniformly in the mean square to X(t) on the 
rerval a < ¢t < b, where the complex scalars a, are 
ven by 


(3) 


= iL @(*(1)X(l) df, (4) 


d the ®,(f) are orthonormalized vector eigenfunctions 
the matrix integral equation 


b 
/ Ki bad! sibs) -d ashe 
2) By virtue of the orthonormality of the ®,(é), 
[ (8) @,(f) dt = 6,1, (6) 


nere 6,, 18 the Kronecker delta. Further, 


nee 
| E(aa*%) = Jn oa i 
in i 


(7) 


rere bu, IS the eigenvalue of (5) corresponding to the 
ution ®,(t). 

3) A generalization of Mercer’s theorem for one di- 
ension yields the representation 


K(s, t) = »s pi. ®,(s) B,* (2), a < 8, t = b, 
k 


(8) 
om which it follows that 


ae / i /*(s)K(s, t)@,(1) ds dl. (9) 


4 A.C. Zaanen, “Linear Analysis,’ North Holland Publishing 
., Amsterdam, Netherlands; 1953. 
> J. B. Thomas and L. A. Zadeh, ‘‘Note on an integral equation 
surring in the prediction, detection, and analysis of multiple 
ne series,’ IRE Trans. on InForMaTIoN TueEory, vol. IT-7, 
_ 118-120; April, 1961. 
°K. Wong, ‘ ‘Vector Stochastic Processes in Problems of Com- 
mication Theory,’ Ph.D. Thesis, Princeton University, Prince- 
D, .; May, 1959. See also, J. B. Thomas and E. Wong, ‘On 
> statistical theory of optimum demodulation,” IRE Trans. on 
FORMATION THEORY, vol. IT-6, pp. 420-425; September, 1960. 
7 J. K. Wolf, “On the Detection and Estimation Problem for 
iltiple Nonstationary Random_ Processes,” Ph. D. Thesis, 
inceton University, Princeton, N. J.; October, 1959. See also 
B. Thomas and J. K. Wolf, “On the statistical detection problem 
multiple signals,”’ TRE Trans. on INFORMATION TuHrory, to be 
blished. 
8H. J. Kelly and W. L. Root, ‘Representations of Vector- 
lued Random Processes,” Lincoln Lab., M. I. T., Lexington, 
iss., Group Rept. 55-21; March 7, 1960. Also, J. Math. and Phys., 
. 39, pp. 211-216; October, 1960. 
pea Nis Balakrishnan, “Estimation and detection theory for 
tiple stochastic processes,” J. Math. Anal. and Appl., vol. 1, 
386-400; December, 1960. 
ps B. ‘Davenport, Jr. and W. L. Root, ‘‘An Introduction to 
[ are of Random Signals and Noise,’ McGraw-Hill Book 
, Inc., New York. N. Y., pp. 96-101 and Appendix 2; 1958. 


1 Turin: On Optimal Diversity Reception 


155 


4) Let J(s, ¢) be defined as the inverse of K(s, ¢) in the 
sense that if F(t) is a vector, and 


Gey = / “KG, OF() df, (10) 
then 
Fe = | Je, 060 at (11) 
(We may thus symbolically write 
/ Kenan gies eee (12) 


where I is the unit matrix and 6(t) is the Dirac delta 
function.) Then, formally, J(s, t) has the representation 


Ie, ) = iue,9e7"(), a<s,t<b. (13) 


A sufficient condition for J(s, t) to exist is that K(s, ¢) 
be positive definite [see (18), below]. 
5) If K,(s, t) is the nth iteration of K(s, t), that is, if 


K\(s, t) = Ks, #) (14) 
and 
b 
KG) eo / K(s, u)K,_,(u, t) du, D2, (15) 
then we may write 
Ki; 1) =] 5 HOLS OOO) as ae) 
k 
Further, it is easily seen from (6) and (16) that 
b 
tr / Leni = gn (17) 
a k 
where ‘‘tr’’ denotes the matrix trace. 
6) If K(s, t) is positive definite, 2.e., if 
b 6 
| i F’*(s)K(s, {)F(t) ds dt > 0 (18) 
holds for any F(t) which satisfies 
b 
i F’*(1)F(l) di < @, (19) 


then p, > 0, all k. Further, the set of vector eigenfunc- 


tions, {®,(t)}, of (5) is complete, z.e., any F(t) which 
satisfies (19) may be written in the form 
ke 
where 
b 
= / &/*(1)F(t) dt (21) 


are called the expansion coefficients of F(¢). 


156 IRE TRANSACTIONS ON 


Note, 


complete, 


however, that whether or not the set {®,(¢)} is 
it follows from (8) that the representation 


WS 1 SD 


fi K(s, t)F(t) dt = 


a 


Dd Ux, P,(S) , (22) 
is valid, provided F(t) satisfies (19). 

7) If G(t) is another vector for which expressions of 
the forms of (19), (20) and (21) hold, and we denote the 
expansion coefficients of G(t) by e,, then it follows from 
(6) that 


/ F’*(1)G(f) dt = > B%e. (23) 
a k 
This is a generalized Parseval theorem. 

8) If K(s, ¢) is positive semidefinite, 7.e., if (18) may 


also hold with an equality, then p, > 0, all k, and we 
may define a ‘‘square root”’ of K(s, t): 


VKG,) = LV BOOM, aSs,t<b. (24) 
We then have from (6) and (8) 
/ YY K(s, u) VW K(u, t) du = K(s, 0). (25) 


All covariance-function matrices are at least positive 
semidefinite, and are very often positive definite. 


B. A Class of Vector Stochastic Processes 
Suppose that X(f) is a zero-mean vector stochastic 
process which may be written in the form 
b 
X(s) = il A(s, t)¥(t) dt, (26) 
where A(s, ¢) is a matrix of nonrandom functions and 
Y(t) is a wide-sense stationary vector stochastic process. 


Suppose further that Y(t) is analytic,’ so we may write 
the Hilbert transform-pair 


Im Y(t) = _HeXO 4, (27) 
Reva) = a, fn hn Ae (28) 


where the principal values of the integrals are implied. 
Then, using the assumed stationarity of Y(t), it is easily 
shown from (27) and (28) that" 


E[Re Y(s) Re Y’(4)] 


E|Im Y(s) Im Y’(#)] (29) 


and 


E[Re Y(s) Im Y'(t)] = —E[Im Y¥(8) Re Y’(é)]. (30) 


11 Note that (29)-(31) do not hold for any analytic vector Y(t); 
an example to which they do not apply is the one-dimensional 
vector exp [j(wt + ¢)] where ¢ is not uniformly distributed. 


INFORMATION THEORY 
That is, 

ELY(s)¥"(0)] = 
and hence 


EYX(s)X"(0)] 


= 0. (32) 


| 


= i ig A(s, wWE[Y(u)Y’(v)JA’(t, v) du dv 


Then, if X(t) is expanded according to (3), we have fron 


(4) 


| 
Eye i if @/*(s) E[X(s)X/()] @*(d) ds dt = 0, (33) 


| 
so that 
{ 
Bl(Re a,)"] = El(Zmay)*] = us (34)| 
i 
and i 


E[Re a, Im a] = 0. (35) 
Thus, not only are the expansion coefficients uncorrelated 
with one another [see (7)], but the real and imaginary) 
parts of each are uncorrelated and have equal variances 

If we now assume that the real and imaginary parts ol) 
the component waveforms of the vector X(¢) are jointly 
Gaussian, then it follows from (4) that the real ané 
imaginary parts of the expansion coefficients are alse 
jointly Gaussian. From (7), (84) and (85) we therefore 
have for the joint density distribution of {Re a,} and} 
{Im a,}: H 


pr [Re a,, Ima,, Re az, Imaz, ---] 
[@ a)? + (Im a] 


k WME Mr 
Saat Pale I 
=cexp| —»> (36 
k Mr | 
where 
aa bl (37)4 
THE 


| 

Letting J(s, ¢) be the inverse, in the sense of (13), of 
K(s, t) and applying relationships of the forms of (22) 
and (23) to the exponent of (36), we may rewrite a 
joint density of (36) as 


pr (X(t)] = ¢ exp [e if ; il X’*(s)J(s, )X(d) ds ar], (38) 


i 
provided J(s, t) exists. It should be carefully noted that 
the ‘‘density”’ pr [X(t)] of (38) is only a shorthand notation| 
for the joint density of the real and imaginary parts of 
the expansion coefficients of X(t). 


51 Turin: On Optimal Diversity Reception 


III. Forma SouuTIon oF THE PROBLEM 


We may now establish a formal solution to the problem 
sed in the introductory section.’” Adopting the view- 
mt of Woodward and Davies’* and Kotel’nikov,"* we 
ognize that the task of the receiver is to calculate the 
postertort probabilities, Pr [€,,(t)/Z(t)], that the mth 
nal was sent, given that the received vector Z(t) was 
served. The largest of these M probabilities is then 
ind, and the corresponding value of m is given as the 
ceiver output in Fig. 1. 

Now, the required a posteriori probabilities are, by 
es’ equality, 


_ Pn pr [Z(D/En(t) 
pr[Z(@)) 


rere P,, is the a priori probability of transmission of 
(t). The ‘‘densities’’ pr [Z(t)/é,,(é)] and pr [Z(t] are 
ken in the sense used in the previous section. We 
sume that the a priori probabilities, P,,, are known 
the receiver; then, since pr [Z(t)] does not depend on m, 
e receiver's task reduces to the evaluation of the likeli- 
ods 


Pr [En(t)/Z(t)] 


(39) 


An = pr [Z(1)/En()], 


., the probability “densities’”’ of receiving Z(t), assum- 
2 that é,,(¢) was transmitted. 
As a first step toward finding an expression for these 
elihoods, let us rewrite (2) as 


(40) 


Z(t) = EDV) + Em V(t) + Nd). (41) 


sre we have split the transmission vector, I(t), into 
o parts, T(t) and I.(t), the first random and the 
eond nonrandom. The latter is defined to be the mean 
T(t), v.e., F.(t) = E[V(@)]; it is assumed known to the 
seiver.'’ Notice now that for the purpose of computing 
, of (40), the receiver must assume that &,,(¢) was trans- 
'tted. Under this assumption, it knows fully the second 
rm in (41). Let us therefore form a new vector, 


W(t) = 2) — &.()P24), 


hich is the random part of Z(t). The probability, as- 
ming ,,(t) sent, that Z(é) is received, is then simply 
= probability that é,,(t),(t) + N(¢) can be equal to 
© new vector defined in (42). That is, 


A,, = pr [£,()Vi() + NG) = W(d)/é,(d)]- 


(42) 


(43) 


2 T. Kailath has also considered this problem, using a different 
Ihnique. See “Optimum Diversity Combiners,’’ Research Lab. 
Electronics, M.I.T., Cambridge, Mass., Quart. Progr. Rept., pp. 
8-200; July 15, 1960. 

43 P. M. Woodward and I. L. Davies, “Information theory and 
ferse probability in telecommunications,’’ Proc. IEE, vol. 99, 
III, pp. 37-44; March, 1952. 

f4V. A. Kotel’nikov, “The Theory of Optimum Noise Im- 
nity,’ McGraw-Hill Book Co., Inc., New York, N. Y.; 1959. 
f5 Physically, [1(¢) may represent, say, a scatter-transmission 
de in the transmission medium, and I.(t), a purely reflective or 
ractive mode of known properties. 


157 


In order to calculate the probability ‘density” 
of (43), we first consider the conditional ‘density’, 
pr (N(@) = Wt) — &.()Ti@)/é»(), TiO], 2.¢., the prob- 
ability that the noise can take on the form W(t) — 
£,,(t),(t), where V(t) is temporarily assumed to be 
known. If we suppose that the noise vector is independent 
of 1,(¢), and is a Gaussian process of the type described 
in section II-B,‘° we can, from (38), immediately write 
an expression for this conditional probability: 


pr (NG) = W(t) — £,.(OV()/En(#), Pi()] 


= ey exp ‘ ab : i  [W(s) — El)" 


Q(s, HIWM) — &,()1()] ds us. (44) 


Here Q(s, t) is the inverse, assumed to exist in the sense 
of (10)-(13), of the covariance-function matrix of N(é), 
cy is a constant of the form of (37), and (0, 7) is the 
interval during which the receiver input is observed. 

In order to simplify (44), let us make use of the fact, 
that Q(s, t), being the inverse of a covariance-function 
matrix, has a square root in the sense of (25). Then, if 
we define two new vectors, 


vs = [| VOX, WOH at (45) 
and 
Ve = {| VOGDEROROd, (48) 


we may rewrite the exponent of (44) as 
Ds Te 

ES / U’*(U(t) dt — i; V’*(#) V(t) dt 
19) 0 


+ 2 Re (| “W(NU() dt. (47) 


In going from (44) to (46), we have made an obvious 
expansion of the integrand, and have invoked the relation- 
ship 
VQ, 8) = (VQ, 4), (48) 
which follows easily from the properties of covariance- 
function matrices, hence their inverses and the square 
roots of these latter. 
As a next step, let us expand V(t) in a series of the form 
of (3): 


V(t) = a mE, (2). (49) 


16 Such will be the case, for example, if the noises in the several 
links are correlated, wide-sense stationary processes. We then 
identify Y(¢) in (26) with N(¢t) and let A(s, ¢) be a diagonal matrix 
of Dirac delta functions. 


158 IRE TRANSACTIONS ON 


According to the results of section II-A, we then have that 
nT 
wL*(L) V(t) de, 


a Eee (50) 


and the &,(¢) are orthonormalized vector eigenfunctions of 


| R,(s, DW) di = oWO, 01 97, eon 
where 
R,,(s, f) = E[V(s)V’*(é)]. (52) 


Note that the dependence, through (46), of R,,(s, ¢) on 
€,,(t) has been made explicit by use of a subscript. 
If we define 
oe 
ge / w/*()U(t) dé, (53) 
and substitute (49) and (53) into (47) and the result 
into (44), we obtain 


pr [N(Z) am W(t) a En( OV s(t) /En(L) 5 Ul Uo oS a 


= Cy exp [={ U’*(f)U(t) dt 


= x | m |’ + 2 Re 2d vt. | (54) 
In (54), we have recognized that knowledge of the 7,’s 
is equivalent to knowledge of I,(¢) if &,,(t) is known. 

A, of (48) may now be obtained by averaging the 
conditional probability (54) over the ,. If we assume 
that V(t) isa Gaussian vector process of the type described 
in section II-B,"’ then, following (36), we have for the 
joint distribution of the n,: 


pr [Re n,, 1m ny, Re no, [Lm gz, --*] 


letea 
= Cr eXp i De Lut) (55) 
k OK 
The constant cp is given by 
eel (56) 
TO}. 
and 
its ie 
Hie = / / w/*()R,,(s, )W,(t) dsdt (87) 


is the eigenvalue of (51) corresponding to the solution 
Ww, (t). 

On multiplying (54) by (55) and integrating on {Re n,} 
and {Im »,}, we finally obtain for the desired likelihoods: 


i eae NE ta)) 


“exp fe , U’*()U@) di + a (58) 


17 This will oceur, for example, if [,(¢) is composed of correlated, 
wide-sense stationary, Rayleigh-fading components; we may then 
identify V Q(s, t) ém(t) with A(s, t) in (26) [ef. (46)]. 


| 


| 
i 


| 
INFORMATION THEORY July 


This is a formal solution of the optimum-receiver problem, 
in which the parameter m appears on the right-hand side 
implicitly in U(t), 0, and 6, [see (42), (45), (52), (53) and 
(57)]. , 

It is desirable, however, to eliminate the artifices of 
the mathematical derivation, 7.e., the eigenvectors 4W,(¢ 
implicit in the @,’s, and the eigenvalues o;. Such a pro-j 
cedure will replace mathematical artificialities with 
physically meaningful entities, and will obviate th 
necessity for solving the vector integral equation (51)! 

First, let us take the logarithm of both sides of (58). 


ee 


bp. ike =e. = SS in (1 + o;,) | 
k 


aaa 


- ib. U*(NU() dt + > . (59; 
k 
The first term on the right in this expression dependy 
neither on the receiver input nor on the transmitted: 
waveform index, it therefore need not concern us 
further. 
The second term does depend on m, although not or 
the received signal. If all the eigenvalues o, are less thar 


unity,” we may write 


-Dmata= DE 


n=1 


Mm, 


5 =k0s 


and, summing on k first with the use of a relationship 
of the form of (17), we obtain 


=e 


In (61), R,»(s, t) is the nth iteration of R,,(s, ¢), definec| 
in the manner of (14) and (15). A fuller discussion of the 
evaluation of B,, in terms of R,,(s, £) is given by Middletor 
for the single-diversity case; his discussion goes ovel 
completely to the multidiversity case, however, by re 
placing all covariance functions by covariance-functior 
matrices, and taking the matrix trace of appropriate 
results.” 

The remaining terms of (59), 7.e 


Ty. 
c= -| U’*()U(t 
0 


depend both on the receiver input and on the index m, 
These terms may be rewritten as follows. 


nif Recon 


Se, 


18 In particular, this occurs at small channel signal- to-nal 
ratios, 7.e., when N(¢) dominates &,(¢) [(¢). For, suppose N(¢) is 
multiplied by a factor of p. V(t) of (46) will then decrease by this 
same factor, as will the 7; of (50), and the o, of (57) will decrease 
asp _ Thus, if we let p— ©, we will have o; — 0, all k. The ensuing 
series for B,, in (60) and (61) may therefore be expected to converge 
quite rapidly for very small signal-to-noise ratios. 

19 T). Middleton, “‘An Introduction to Statistical Communicatior 
Theory,” McGraw- Hill Book Co., Inc., New York, N. Y.; 1960, 
See Section 17.1; in particular, the trace of the negative of Middle. 
ton’s (17.19), evaluated for \ = 1, is the equivalent of (61) of the 
present paper. See also D. Middleton, “On the detection of stochastic 
signals in additive normal noise, I,’”? IRE Trans. on Inroraaay 
THEORY, vol. IT—3, pp. 86-121; June, 1957. 


61 


We first note that the matrix Ié(s — ¢) + R,,(s, ¢) is 
sitive definite in the sense of (18), even if R,,(s, ¢) is 
t. Hence, if R,,(s, ¢) in (51) were replaced by 
(s — t) + R,,(s, t), the new integral equation would 
rtainly possess a complete set of eigenfunctions. Further, 
ese eigenfunctions would include {,(£)} as a subset, 
id the eigenvalues corresponding to this subset would 
- {1 + o,}. By formally following (13),”° we therefore 
n write an expansion for the inverse of I6(s—t) + R,,(s, t) 
the form 


fey = — W,(s) V() + 20s, 0), 


seal 
k Ik Se Gi, 


0s he 1, 463) 


nere the second term, r(s, ¢), is similar in form to the 
‘st, but includes only those eigenfunctions of the new 
tegral equation which are not in, hence are orthogonal 
,{8.()}. r(s, t) is clearly zero if R,,(s, t) is positive 
“finite. 

Now, following (22), we may write 
y (i 

Rs. )U4) di = > 06,85), 
k 


O<s<T, (4) 


here the 6, are as in (53). Then, defining a new vector, 


| a Te. Te 
Ws = | [PG wR w, HUW du dt, 
0 0 


(65) 


ad using (63) and (64) in this definition, we have by 
rect calculation (using the orthonormality of the W&,(t)): 


o,0, 


DONS yr Or OS ST) (66) 
se of (53) and (66) easily shows that 
: +a oR | 6, |’ 
i FAWUG) di (67) 


(62) finally becomes 


oe T 
S. = —f HDD a+ | WADE ae. (68) 
v0 0 

nis last expression, as we shall see, has a most interesting 
terpretation. 

Eq. (68) may be rewritten in another form of interest 
. invoking the inverse relationship between P(s, ¢) and 
Ks — t) + R,,(s, t) to write [see (12)]: 


de T 
S=f[ f Pe, wll su) +R,u, )U} dude. 
0 0 
(69) 
or, insertion of (65) and (69) into (68) then yields 


s.=-f[ | U*@PG, 9U~ asa (70) 


y We say “formally” since the new kernel does not satisfy the 
hditions imposed at the beginning of section II-A. The pertinent 
lults do carry over to the new kernel, however. 


Turin: On Optimal Diversity Reception 


159 


A third useful expression for S,, arises from letting 


ie eaves Tats —— 
Ts) = [| VPC, i) UO dt, (71) 
0 
where VP(s, ¢) is defined in the manner of (25). We 
recognize that V/P(s, f) = [WV P(t, s)|'*, whence we may 
easily show that (70) may be expressed as 


Ss, = —f TOT at (72) 


To recapitulate the results of this section, we recall 
that the task of the receiver is to find the value of m for 
which (39) is the largest. This may be done by finding the 
value of m which maximizes the quantity In P,, + ln A,,, 
or, equivalently [cf. (59), (60) and (62)], which maximizes 
S, + B/, where 


BL =P, — > Ino Pee (73) 
k 


The biases, B/, of (73) do not depend on the received 
signal; they may be calculated once and for all by means 
of (60) or (61). S,,, which does depend on the received 
signal, may be calculated by means of (62), (68), (70) 
or (72). Let us now consider physical interpretations of 
these mathematical results. 


IV. INTERPRETATIONS OF THE RESULTS 


In the foregoing analysis, we have often encountered 
two types of operation: 


Ys) = [ G, OX ai (74) 


and 


ee 
iL X’*({)¥(1) db. (75) 
Let us therefore consider these in some detail. 

The first is a linear operation on X(t), which may be 
interpreted in terms of a “‘matrix’’ filter. This filter has, 
say, g inputs, 2;(f)47 = 1, --- , qg), which are the com- 
ponents of X(t), and qg outputs, y:(s)@ = 1, --- , q), 
which are the components of Y(s). From (74), these 
inputs and outputs are related by 


qd eT 
yw) = Df hale, Ded as, (76) 
7=1 
where h,;(s, t) 1s the 27th element of the gq X q matrix 
H(s, t), and represents a time-varying impulse response 
function giving the influence of the jth input on the 7th 
output. 

Note that (74) and (76), interpreted literally, do not 
always represent a physical operation, for we have gener- 
ally been considering cases in which X(t), Y(t) and H(s, t) 
are complex. However, in the physical interpretations of 
our results which follow, we shall find that we are actually 


160 IRE 


only interested in the real parts of expressions of the form 
of (74), 2.e., mm 

an 

| Re Ho, 
JQ 


Het Y.(S)\ a= ) Re X(t) dt 


. 
— ‘ Im H(s, t) Im X(t) dt. Cra) 
J0 


Further, we shall see that in all cases we have considered, 
H(s, ¢) is ‘‘analytic’’ in the sense that H’*(t, s) = H(s, t) 


and 
Im H(s, t) = Af MMe. Z do 
aa) ame 
ai ParivegeL(s, aay, ke 
© ite. JS; E Cy 


where the principal values of the integrals are implied. 

Then, if X(¢) is analytic in the sense of (27) and (28), 
it is easy to show by direct calculation that the two terms 
in (77) are approximately equal, 7.e.,”" 

E 
Re Y(s) & ii [2 Re H(s, f)] Re X(t) dt. (79) 
0 

That is, the real part of the output may be calculated 
through the use only of the real part of the input and 
the real part of the filter matrix. We shall henceforth 
depict an operation of the form of (79) as in Fig. 2, with 
the understanding that only the real parts of all complex 
quantities are meant. 

In the case of the second type of operation, (75), we 
shall again find that in all cases it is the real part, 

te th 
/ Re X(t) Re Y(4) di + i Im X(t) Im Y(t) dt, (80) 

0 0 
in which we are interested. If X(t) and Y(¢) are analytic, 
the two terms in (80) are approximately equal,” so we 
may write 


ke [ X’*( Y(t) dt & 2 ie Re X'(t) Re Y(t) dl 


=2 Ir [3 Re x(t) Re nit | dt 


We shall depict (81) schematically as in Fig. 3, with the 
understanding that it is the real parts of the vectors 
X(t) and Y(t) which we must actually multiply together 
component by component. 

With these points in mind, we may proceed to draw 
block diagrams of the ideal receiver. We seek to compute 
the quantity S, + B/ for each value of m, where S,, 
may be obtained from the receiver input vector Z(t) by 
the sequence of operations (42), (45), (65) and (68), or 
by the sequence (42), (45), (71), (72). Note that each of 
these sequences terminates in the evaluation of an integral 
of the form of (75). Further, since it may be shown” 


(81) 


*t This statement, as well as others involved in the physical 
interpretation of the optimum-receiver equations, 1S an approxima- 
tion, due to the fact that a finite time interval is being considered, 
rather than an infinite one. However, the statements are good 
approximations in the case most of interest, when the signals are 
narrow-band, 7.¢., have center frequencies large compared VOLyatie 
Cf. the discussion of truncation in Appendix IIT. 

22 See Appendix I. 


TRANSACTIONS ON 


INFORMATION THEORY J ull 


x, (1) 
x, (1) 


Y (1) 


ig. 2—A matrix filter. 


Fig. 3—A vector correlator. 


that U(é), V(t) and T(¢) in these integrals are analytic 
and since S,, must of necessity be real, the evaluation 
may be performed using real parts, in the manner o 
(81). But the vectors U(é), V(t) and T(é) are ultimateh 
derived from Z(t) by sequences of complex matrix filtering 
of the form of (74); it is shown in Appendix I that thi 
desired real parts of these vectors may be computed bi 
corresponding sequences of real filtering operations of thi 
form of (79). Therefore, following the convention adoptec 
in Figs. 2 and 3, we may depict the computation o 
S, + B/ as in Fig. 4 if the sequence of operations (42), 
(45), (65), (68) is used, and as in Fig. 5 if the sequenet 
(42), (45), (71), (72) is used. In Fig. 4 we have used the 

notation [see (65)| 

7 

OG 4 / P(s, wR,,(w, f) du, (82 

10) 
and in Fig. 5 we have combined the filtering operation; 
of (45) and (71) by letting 
(83; 
Since it is understood in these block diagrams that thé 
real parts of all complex quantities are meant, we not 
in particular that the receiver input in both cases i 
Re Z(t), the actual set of received physical waveforms 
Fig. 4 has a particularly interesting and enlightening 
interpretation in terms of well-known results for optimum 
correlation reception through a channel disturbed onl} 
by additive, white, Gaussian noise.”’'* In our case, 0 
course, the additive disturbance in Z(t) (7.e., N(é)), al 
ihonelt Gaussian, is not white; but it is easy 16 see thar 
the filter V’Q(s, f) has the effect of whitening this additive 


WNGe a ie V/P(s, v) VO, B du. 


| Em My. (t 
| sca 
| Eq appt 


1 
DECISION 
CIRCUIT 


A 
U(t) -V(1) 


Fig. 4—The computation of S,, + Bn’. 


DECISION 
CIRCUIT 


Fig. 5—Another way of computing S,, + By’. 

sturbance. For, note from (41), (42), (45) and (46) 

at we may write the output of this filter as 

ee ‘Cue 

U™=VO+ | VEGANOA, (4) 
0 

nere the first term on the right is due to the transmitted 


rnal, and the second term is an additive noise. The 
variance-function matrix of this additive noise is” 


ie nT. 
Sf VQ6,0 BRNO) 
| (VQ(s, v))/* du dv = 


nat is, the elements of the noise component of the vector 
(t) are stationary, uncorrelated (and hence, because 
ey are Gaussian, independent), and have identical white 
pwer spectra. 

Were this noise component the only disturbance in 
(t), we should expect that the remainder of the receiver 
yuld be a multidiversity analog of a correlation re- 
iver,’’'* in which U(¢) would be correlated with its 
mal component, V(t), which would be known to the 
eiver. Unfortunately, the signal component is not 
own to the receiver. Similar work of Price” and 


Kye ES) 


23 Hq. (85) follows immediately from (48) and from the fact that 
Q(s, ¢) is the inverse, in the sense of (12), of VE[N(s) NG) 
24. Price, “Optimum detection of random signals in noise with 
plications to scatter-multipath communication, I,’’ IRE Trans. 
INFORMATION THEORY, vol. IT-2, pp. 125-135; December, 1956. 


Turin: On Optimal Diversity Reception 


161 


Kailath,”’”’ suggests, however, that the receiver makes 
up for this lack by estimating V(t). This indeed turns 
out to be the case; it is shown in Appendix II that the 
output of the filter O(s, ¢) in Fig. 4, i.e., —V(t), is an 
optimum estimate of —V(t) in both the maximum-prob- 
ability and minimum-variance senses. 

Thus, as Price’* and Kailath’”’”? have found in other 
cases, the optimum receiver of Fig. 4 is, after all, an 
extension of that of Woodward and Davies: after a 
noise-whitening operation on W(t) to obtain U(t), the 
receiver performs a correlation operation, given by the 
second term on the right in (68), in which, for lack of 
having the true signal component of U(¢) available to 
correlate with U(t), an estimate of this signal component 
is used. The first term on the right in (68) is analogous 
to the received-signal energy term in the Woodward- 
Davies receiver. 

A final interpretive point is of great importance. It is 
clear that the signal component, é,,(t)(@), of Z(t) may 
alternatively be interpreted not as the result of trans- 
mitting a known signal through a random medium, but 
as a stochastic signal with known statistics. Thus, for 
example, in the binary case we could take é,(t) = 0 and 
&(t) = 1; then the problem we have been considering 
reduces to the detection of a random signal vector, T'(¢), 
in the presence of random noise.’”'**’** From this point 
of view, (72) is seen to be related to a result of Wolf.’ 


V. EXAMPLES 


In order to obtain further insight into the nature of the 
solutions depicted in Figs. 4 and 5, we consider below 
two special cases. The first exemplifies a receiver of the 
form of Fig. 5, and the second, one of the form of Fig. 4. 


A. Very Fast Fading 


Let us suppose for simplicity that the input noise vector 
N(¢) is composed of independent, stationary, white noises; 
this is no great restriction on generality, since we have 
already seen that the receiver’s noise-whitening filter 
would establish this state if it were not so. We may, for 
our purposes write the noise covariance-function matrix 
for this case approximately as (see Appendix IIT) 


Nou 


E[N(s)N’*(2)] = 2 is — 2), (86) 


0 


Nox 


where N,, is the single-ended noise power density in the 
Ith diversity link. Then 


Qs, ) = q ds — 4), (87) 


> 'T. Kailath, “Correlation detection of signals perturbed by a 
random channel,” IRE Trans. on InrorMation Turory, vol. 
IT-6, pp. 361-366; June, 1960. 

26 R. C. Davis, ‘The detectability of random signals in the 
presence of noise,’’ IRE Trans. on INrorMATIoN TuEorY, vol. IT-3, 
pp. 52-62; March, 1954. 


162 IRE TRANSACTIONS ON 
where 
Nae 
= 0 
| Noe 
ne ha | (88) 
| 
Y Nea 
From (45) and (46) we therefore have 
V(s) a Va En (8)11(8) (89) 
and 
U(s) = Vq Wis). (90) 


If we now make the further simplifying assumption that 
I, (¢) is stationary, then R,,(s, t) of (52) is of the form 


R,,(s, f) = (9) VqGs—tf Va, (1) 


where 


G(s — 4) = ELVA) i*(). (92) 

In order to define the condition for fast fading, we 
assume that I’,(t) varies so very much faster than the 
transmitted signal, &,,(¢), that whenever I,(¢) and £,,(¢) 
appear in a product, as in (91), we may make the approxi- 
mation (as far as the é,,(£) are concerned)” 


G(s — 4) = g Os — J), (93) 


where g is a Hermitian matrix of constants, the klth 
element of which is twice the cross-power density of the 
fadings in the kth and /th links. Using (93), (91) becomes 


R,.(s, t) = €,,(S)é,(HC os — 0), (94) 
where C = V/qg V4q. Further, 
I ois — 4) +R,,(s, ) & [1 + &.(9)ER@C] os — 2), (95) 
the inverse of which is [see (10)—(12)] 
P(s, t) = [1 + En(S ERC] Als — 2). (96) 


We now use (70) to calculate S,,: 


Ae a EAE &, 
Sn = —f Va wr(o[l + | él) PC" WH) V4 dt. 
0 
(97) 
To investigate further the nature of (97), let us now 
assume that g is diagonal, 7.e., that the link fadings are 
independent. Then (97) becomes 
: 1 ‘i | w(t) & 
ve, 2N Di aN 
TDN sor lee (Greed Oi Non) 


dt, (98) 


27 Note that (93) assumes that, if there is link-to-link correlation 
of fading, the correlation only exists at simultaneous instants of 
time. 

Strictly speaking, the covariance-function matrix of (93) violates 
the hypotheses of the mathematical formalism upon which our 
solutions are based. One may nonetheless justify its use by an 
argument of physical continuity: if a reasonable result is obtained 
by formally inserting (93) into the optimum-receiver equations, 
this must be an approximation to the result which would be obtained 
by using any valid covariance-function matrix to which (93) is 
an approximation. 


INFORMATION THEORY July 


where w,(t) is the /th element of W(t) and 2g, is the /th 
diagonal element of g. Recall from (42) that w,(d) is the: 
random part of the /th receiver input, the known signal 
component having been removed. In the fast-fading case 
we have been considering, it is clear that the phase of ‘hid 
w(t) bears no relationship to the phase of the trans 
mitted signal, since signal phase has been ae 
instant by instant by the transmission medium. Thus, we’ 
expect that the phase of w,(t), carrying no information 
about the transmitted signal, will not appear in the, 
optimum-receiver expression when the fadings are in- 
dependent.» This expectation is verified by (98), 
which is purely energetic in nature: only the in-| 
stantaneous powers, | w_(t) |°, of the received signals enter, 
suitably weighted. Note that the weighting functions, 
—({1 + (g: | &n(t) |?/Nor)]"’, increase monotonically, 
(although weakly) with the instantaneous signal-to-noise; 
ratios,”® g; | &(t) |?/Nor, so that the stronger diversity 
links are emphasized, and these at the most favoralal 
instants. 

There remains the calculation of the biases, B/, of 
(73). Note that these depend only on the a priori trans- 
mission probabilities, P,,, and on the eigenvalues, o;, of 
the integral equation (51). In the present case, the kernel 
of the equation is given by (91), where we have assumed 
that as far as the &,,(¢) are concerned, G(s — f) is approxi- 
mately as in (93). Therefore, we may write the integral 
equation approximately as 


) 


i) | Va 66-0 Va wl dt = 8), 


O<s<T. (99) 
Notice in particular that the solutions to this equation, 
hence the o,, depend only on the modulus of €,,(¢), not 
on its argument. Thus, the biases in the present case, 
depend only on the amplitude modulations of the trans- 
mitted signals, not on their phases. If all the signals are 
a priori equiprobable and have identical envelopes, then 
all the biases are the same, and the receiver may make 
its decision solely on the basis of comparison of the sa 
of (97) or (98). 

We have already seen” that at small channel signal 
to-noise ratios the series (61) for the term B,, in the bias 
By, may be expected to converge rapidly. In the ever 
that the first term of the series suffices, we have for the, 
present case [see (91)] 


18 


B, =~ —Vq G0) Vq ie | En(8) |? ds 


= ~2E, Vq GO) Va, (100) 


) 


*’ Of course, when the fadings are dependent, the relative phases 
of the w((t)’s will appear. 
_ ** gzis the power density of the fading in the /th link, so g:|£»(t)|® 
is a measure of the instantaneous power of the signal component at. 
the /th receiver input. No; is the noise power density in the /th link, 


| 


where L,, is the energy in the mth transmitted signal 


96 1 


hus, at very small signal-to-noise ratios the biases are 
yual and may be eliminated from the receiver if the 
ansmitted signals are a priori equally probable and 
ave equal energies. 

As a very simple example of the application of the 
pove results, let us consider a problem in radiometry: 
1e detection of a wide-band random signal in the presence 
| thermal noise. For this case, as we have noted before, 
e may reverse the roles of T(t) and &,,(¢) in (2), associat- 
i I(t) with the random signal. We then let &,(t) = 0 
yrrespond to the null hypothesis (noise only), and we 
ay let &(t) correspond to some arbitrary waveform 
ith which we propose to modulate I(t) prior to its 
erturbation by N(t), the receiver’s thermal noise.*° 
' we consider for simplicity only the single-diversity 
vse, and let I.(¢) = 0 (no nonrandom component in 
‘(t)), we have from (42) and (98): 


ae Oe Lets 
2N i i+ W./9 la) 


S, — 8, = (101) 
hat is, the optimum radiometer must compute the 
orrelation between the squared envelope of the effective 
ceived waveform, ¢(f), and a monotone-increasing func- 
on of the envelope of the modulating waveform, &,(t) 
it small signal-to-noise ratios, this function is Just 
| &(t) |?/N). The quantity computed is compared with 
threshold determined by the biases, and it is decided 
iat a signal is present if the threshold is exceeded. 


. Very Slow Fading 


We now consider the opposite extreme, in which I(t) 
aries so very much more slowly than £,,(¢) that we may 
‘nore its time dependence and denote it simply by TY. 
Ye shall again assume for simplicity that N(¢) is com- 
psed of independent, stationary, white noises, so (87)—(92) 
ill obtain, with T,(¢) and G(s — #4) in (89), (91) and (92) 
tplaced by I, and G. In particular, if we let 


M = VqG v4q, (102) 
hen 
| R,,(s, f) = E,(s)€4()M. (103) 
/It may easily be verified from (12) that the inverse of 
(s — t) + R,,(s, #) is 


,#) = Lo — t) — £, (8) (M* + 22,1)", 


: ere, as in (100), Z,, is the energy in £,,(t). On placing 
04) into (70), we obtain 


(104) 


7 
{2 -[ U’*()U(t) dt + ¥*(Mo! + 2B,,1)'Y, (105) 
| 0 
| ere we have set 


y-[ e*(#U(1) dl. (106) 


30 R. H. Dicke, ‘“The measurement of thermal radiation at micro- 
ve frequencies,” Rev. Sci. Instr., vol. 17, pp. 268-275; July, 1946. 


Turin: On Optimal Diversity Reception 


163 


Note that (105) is of the form of (68), and is hence illus- 
trated by Fig. 4. 

As for the biases, it is shown in Appendix IV that the 
term B,, in (73) is 


B, = —In| 1+ 2E,M |, (107) 


from which it is clear that the biases depend only on the 
energies of the signals and on their a priori probabilities. 

Let us now further specialize the results in (105) and 
(107) by assuming that the link fadings are uncorrelated. 
Then M of (102) is diagonal, its /th diagonal element 
being, say, p:/2No, where p, = E[ | y1: |’ ] is the mean- 
square envelope transmission strength of the random 
part of the /th transmission link. Using (90) and (106) 
in (105), we then have, for independent links, 


i i 5 
|e f | w,(t) |? dé 


L 


S i 


ses 
(o1/4No) | [80 
1 
a 1 + (pf ,/No1) Al ee 
Further, from (107), 
- E 
Bn = — >in (: + bin) (109) 
l=1 Not 


Recall from (42) that w(t) is related to the receiver 
inputs ¢,(t) by 


w(t) a (2) — Yorem(b) , 


where y2; 1s the /th component of F., the nonrandom 
part of the transmission vector." On using (110), and 
assuming equal noises, No, = No, all J, (108) reduces to 
a result given in a previous paper.** In that paper it 
was shown that the physical operations corresponding 
to (108) comprise matched filtering, sampling, and a 
combination of coherent and noncoherent detection. 

In particular, if p; = 0, all / (¢.e., there is no random 
transmission component), (108) reduces to 


1 ‘ 12 = | Yor [2m 
Nay MO as 


+ Ref ne Bye t0) 2 


In (111), the first term is independent of m and may be 
neglected; the second does not depend on the received 
signal, and contributes only to the bias. The third term 
comesponds to the optimal linear diversity combiner of 
Brennan: the received waveforms ¢,(t) are multiplied 
by the complex numbers y4/No:, the phases of which 
place the signal components of the ¢,(t)—v2z., YorEn(t) 
[see (1)|-—in phase coherence, and the magnitudes of 


(110) 


L 


(111) 


eq. (24). Identify 20,2, a; exp (j6;) and Yn(7:) 
v2. and (eas En*(t) €i(t) dt, respectively, 


DOT SUOY, ODs Cline 

in the cited paper with pz, y 

of the present paper. 

2D. G. Brennan, “On the maximum signal-to-noise ratio 

eaieshis from several noisy signals,’’ Proc. IRE, vol. 48, p. 1530; 
October, 1955. 


164 IRE TRANSACTIONS ON INFORMATION THEORY July 


which are monotonically related to the signal-to-noise 
ratios in the various links. The complex-weighted received 
signals are summed, then passed into a filter matched to 
£,,(t)°° the filter output is sampled at ¢ = T. 

If, on the other hand, the transmission vector T has 
no fixed component, 72.€., Yo. = 0, all 1,’ then (108) 
reduces to 


L Th 


[ Gays sah: 


m = ay T 
l=1 2N Ol v0 


L A n72 ii 2 
g p1/4No1 ; | | * 
ey I (piLim/Noi) | Jo em) E10 Ht | oe) 


S 


Here the first term is again independent of m and may 
be neglected. The second term is a generalization of the 
square-law combination of Pierce:** before combination, 
the ¢,(t) are passed into filters matched to £,,(t),°° the 
outputs of the filters being then square-law envelope 
detected and sampled at t = 7’; the samples are weighted 
by quantities related to the channel parameters, and 
the weighted samples are summed. Note that the weights 
are only equal when p; = p and No; = No, all I, 2.e., 
when all the links are identical. When the signal-to-noise 
ratio in the Ith link is large—z.e., p,L,,/No. >> 1—the 
Ith weight is 4/H,,No,; at small signal-to-noise ratios, 
Wis 91/4.N a3 

As a final special case, let us consider a simple example 
in which there is link-to-link correlation of (slow) fading. 
More precisely, let us consider a dual-diversity case in 
which Tr, = 0, Nor = Noz = No, and the covariance 
matrix of T, is 


east | (113) 


That is, the complex correlation coefficient between the 
random fadings y,, and y,. is A. Then, using (88), (90) 
and (102) in (105) and (106), we obtain, after some 
manipulation involving the diagonalization of the quad- 
ratic form in (105), 


1 sl ; 
Sm = SON, ea, [ | i) [ a 


| a+ )aAp la} 
14 pL,,(1 + | d]) 


jhe Ai) 


p_ 
T 3n? 


GQ — Al) la |? | 
pL nwAl — | d]) 
1+ ve 


ae (114) 


~ 


where we have set 


bis / (DA) + O/| ADE] dt (115) 


3G. L. Turin, ‘Error probabilities for binary symmetric ideal 
reception through nonselective slow fading and noise,”’ Proc. IRE, 
vol. 46, pp. 1603-1619, Appendix II; September, 1958. 

% JN. Pierce, ‘““Theoretical diversity improvement in frequency- 
shift keying,” Proc. IRE, vol. 46, pp. 903-910; May, 1958. 


and 


Further, from (107), 


a2 = ik ELS) — O*/[ AN] dt. (116) | 


at E pes aE poe le | D1 


(117) 


] 

Notice that \/| A | in (115) and (116) is just a phase — 
factor; it is, in fact, a measure of the average phase | 
difference between the signal components of the two | 
received waveforms. The optimum dual-diversity receiver | 
thus first makes an attempt to place the signal components — 
of ¢,(¢) and ¢,(¢) in approximate phase coherence by the 
phase-shifting operations in (114) and (115). The phase- | 
shifted received waveforms are then coherently added, | 
(115), and subtracted, (116), and the sum and difference | 
are each passed into a filter matched to €,,(¢). The squared 
envelopes of the two filter outputs, sampled at ¢ = T, | 
are then combined in the weighted manner indicated | 
in (114). | 
It is easily seen that (114) and (117) reduce to special ’ 
cases of (112) and (109), respectively, when >A = 0, 7.e.,% 
when the fadings are uncorrelated. When the fadings are- 
identical, 2.e., \ = 1, (114) and (117) readily reduces 
to the expected result: the two received waveforms should 
be added at the receiver input and the sum thenceforth | 
treated as a single-diversity signal with a 3-db greater ' 
signal-to-noise ratio [cf. (112)]. 


5" 


APPENDIX [ 


In order to prove that all operations in the optimum 
receiver may be carried through using only the real_ 
parts of complex quantities, we must show that all com-— 
plex filter matrices are analytic in the sense of (78), and_ 
that all vectors involved in various stages of the opera-_ 
tions are analytic in the sense of (27) and (28). 

The vectors under consideration are, from (42), (45), 
(68) and (72), W(t), U(é), Vit) and T(é). That W(é) is 
analytic follows from the fact that sums and products of ! 
analytic functions are analytic; for, since £,,(t), F(t) and- 
N(t) were defined to be analytic, Z(t) of (41) and W() | 
of (42) are then also analytic. The other three vectors 
all appear as filter outputs [see (45), (65), (71)] and are | 
automatically analytic if the filters are analytic, as may be 
seen through the use of (78) in (74).”* We now establish — 
this latter condition. | 

The complex filter matrices we are concerned with are, | 
from (45), (65) and (71), the matrices VQ(s, 1), VP(s, d) | 
and fi P(s, u)R,,(u, t) du. Note that all of these are - 
derived from covariance-function matrices: Q(s, ¢) isl 
the inverse, in the sense of (10) and (11), of the covariance- 
function matrix of N(t); R,,(s, ¢) is the covariance-function . 
matrix of V(t) of (46); and P(s, @) is the inverse of 
15(5= 4) eReGey 

Now, N(t) is analytic, so the eigenvectors of its’ 


961 


arhunen-Loéve expansion [cf. (3)| must also be analytic.”* 


fence, it follows that H[N(s)N’*()], represented in the 
orm. of (8) (which has real coefficients), must be analytic 
1 the sense of (78). But from (13) and (14) it is clear 
1at the square root of the inverse of H{N(s)N’*(¢)]—<.e., 
/ Q(s, f)—has a representation which differs from that of 
({N(s)N’*(t)] only in the (real) expansion coefficients, 
ut not in the expansion vectors. Thus +/Q(s, f) must 
lso be analytic. 

Since VQ(s, f) is analytic, V(d) of (46) is analytic, 
nd by applying the same type of argument as above to 
-m(8, t) and P(s, t) [see (63)], we may easily conclude that 
'(s, t) is analytic, and hence so is VP(s, 2). 

Finally, since the representation of the third filter 
vatrix—t.e., [> P(s, u)R,,(u, t) du—is the same as the 
rst term of (63), except with (real) coefficients 
./(1 + o;), it follows that this filter matrix is analytic too. 


APPENDIX II 


In order to prove that V(t) is an optimum estimate of 
(t), we have merely to prove that the expansion co- 
ficients of V(t) in (66) are optimum estimates of the 
xpansion coefficients of V(t) in (49). Note that the 
bservables of the problem are the 6,, computed from 
‘(t) according to (53). Thus if we interpret ‘‘optimum”’ in 
re maximum-probability sense, we wish to show that 
he set of coefficients {7.} = {0,6,/(1 + o,)} maximizes 
re conditional distribution pr [{n.}/ {Ox}. 

) Now, the conditional distribution of the {7,} may be 
ritten as 


ew Ls Lind] pr lie. / im $) , 
pr [193] 


here the first factor in the numerator, from (55), is 


pr ({m}] = er exp Beast 


k O, 


pr (im) / (6.3) (118) 


(119) 


n order to find an expression for pr [{6,}/{n,.}] m (118), 
e note from (50), (53) and (84) that 6, may be written as 


A, = m% ae €ky (120) 


there the e«, are the expansion coefficients of the second 
"rm in (84). From the Gaussianness of N(é) it follows 
nat the e, are jointly Gaussian, and from (85) it follows 
nat Ele.e%] = 6,, for any orthonormal system of expan- 
ion vectors. We may therefore write [cf. (36) | 
PLO} / {31 = pr le} = 


{ 0, ca ni} | 
| =¢, exp[—)) | & — ™ 


k 


ee abe) 


ince we have postulated the signal and noise terms in 
14) to be independent, we have further, from (57), 
| 6, |'] = 1 + o;; then 


pr [{0:}] = ¢ exp [= 2d ie 


k 


] 
| 
| 


(122) 


n placing (119), (121) and (122) in (118), we obtain for 
e a posteriorz distribution of the ,: 


Turin: On Optimal Diversity Reception 


165 
pr Ui me} /{ O$] 
= — exp | - dX | St = oe) | | (123) 


Clearly, the a posteriori most probable set of 7's is 
that for which the exponent in (123) is zero, 1.e., 
tne} = {o,0,/(1 + o,)}, which was to be proved. Further 
note that this optimal set of 7,’s is the set of conditional 
means of the 7,; thus the set is optimum in the minimum- 
variance, as well as the maximum-probability, sense. 


AppENDIXx III 


As is well known,’ the covariance-function matrix of 
a Stationary, zero-mean, complex analytic vector process 
N(t) has a real part equal to twice the covariance-function 
matrix of Re N(¢), and an imaginary part equal to the 
Hilbert transform of the real part. Thus, (86) should 
strictly have been written as 


E[N(s)N’*(Z)] 
uae 0 
02 . | 
= E => ii) 45 int4 |. (124) 
NS) 


In order to see the approximation involved in using (86) 
instead of (124), let us write 


gl 
(0) =" OE fog (125) 
wt 
and consider the operation 
if foo) 
/ k(s — f) 2(0) di = ii k(s — fe(é) dt, (126) 
0 —o 
where x(t) is a complex waveform and 


Bee fo Op Reaecs 4h 


0 elsewhere 


(127) 


is a truncation of x(¢). 

Note that the right-hand side of (126) is the convolution 
of k(t) with x7(t). The equivalent operation in the fre- 
quency domain is the multiplication of the Fourier trans- 
forms of the two functions. Now, the Fourier transform 
of k(t) is zero for negative frequencies, and 2 for positive 
frequencies. Thus, if 27(¢) has no negative-frequency com- 
ponents in its Fourier transform, the result of the operation 
in (126) is merely to multiply the transform of x7(é), 
hence x7(t) itself, by 2. In-this case, the effect of k(t) in 
(126) is precisely the same as that of the operator 2 6(¢). 

Unfortunately, the truncation operation of (127) pre- 
cludes the complete absence of negative-frequency com- 
ponents in w7(t). But if x(t) is complex-analytic, as are 

39 See, e.g., M. Zakai, ‘Second-order properties of pre-envelope 
and envelope processes,’ IRE Trans. on INFoRMATION THEORY, 
vol. IT-6, pp. 556-559; December, 1960. Note, however, that for 
a process with nonzero mean this statement is true only within an 


additive constant equal to the square of the mean of the Re N(t) 
process. 


166 IRE TRANSACTIONS ON 


all the waveforms in this paper, then it has no negative- 
frequency components; if, further, only a small fraction 
of the power (or energy) in 2(¢) lies below roughly 1/7 eps 
in frequency (this includes most applications of interest 
here), then the truncation will produce no significant 
negative-frequency components in x7(¢), and the previous 
argument concerning the equivalence of k(t) and 2 6(¢) 
in (126) holds to a very good approximation. To this 
approximation, whenever (124) is used as an integral 
operator in the manner of (126)—and this is uniformly 
its use in this paper—(86) may be used in its stead. 


APPENDIX IV 
On placing (103) into (51), we obtain 


ME, (s) [ “se (Qw() dt =o¥), O<s<T. (128) 


Solutions of this are clearly of the form W,(t) = F;,é,,(d), 
where F, is a time-invariant vector which, from (128), 


A New Derivation of the Entropy Expressions” 


SOLOMON W. GOLOMB?{ 


Summary—tin the discrete case, the Shannon expression for 
entropy is obtained as a line integral in probability space. The 
integrand is the ‘‘information density vector” (log ji, log ps, ---, 
log pn). In the continuous case, the continuous analog of information 
density is integrated to obtain the entropy expression for continuous 
probability distributions. 


(CCossipe the integral 


b b 
[ tog 2 ae = f log x dx — [ log (1 — x) dx 


= [x log x — x]? + [u logu — uliz: 
= [blogb+ (1 — 0) log 1 — 0)] 
— [a loga+ (1 — a) log (1 — a)] 
= H(b,1 — b) — H(a, 1 — a), (1) 


where H(a, 1 — x) is Shannon’s entropy function. 
If an experiment has two possible outcomes, which 


* Received by the PGIT, September 30, 1960. This paper 
presents the results of one phase of research carried out at the 
Jet Propulsion Laboratory, California Institute of Technology, 
under Contract No. NASw-6, sponsored by the National Aero- 
nautics and Space Administration. 

+ Jet Propulsion Lab., California Institute of Technology, 
Pasadena. 


\ 
| 
INFORMATION THEORY July 
must satisfy the set of algebraic equations | 

/ 


2EH,,MF = oF. (129)) 


Since the o, are the eigenvalues of (129), (1 + o,) are | 
the eigenvalues of 


(I + 20,M)F =F, 430) 


whence, by a well-known result from the theory of linear | 
equations, | 
| 1+ 2z,M | = JJ d + o,). (131) , 

lg 

Insertion of (131) into (60) leads immediately to (107). 


VI. ACKNOWLEDGMENT 


It is a pleasure for the author to acknowledge the 
enlightening discussions he had with Dr. D. R. Anderson 
and Dr. A. V. Balakrishnan on some of the mathematical 
theory upon which this work is based. 


i 


are assigned a priori probabilities a and 1 — a, but, 
after receipt of further information, are assigned the 
a posterior: probabilities b and 1 — 6b, the net change 
in information (7.e., the quantity of additional information 
received) is measured by (1). This suggests the definition, 
D(x, 1 — x) = log [x/(1 — «)] as the information density 
for an experiment having two possible outcomes, with 
probabilities x and 1 — wz. Specifically, the information; 
density D(x, 1 — x) has the property that integration, 
from « = a tox = b yields the net change in information 
when the probability assigned to 2 is changed from a to b. 

If p and gq are probabilities, p + gq = 1, then 
D(p, q) = log (p/q). This function frequently occurs as a 
criterion function in statistical decision theory. For those 
interred in the axiomatic approach, it suffices to seek 
an “information density function”? D(p, q) which satisfies 
the single axiom 


k Dp, =D", ). 
(p, @) eg ae (2) 


Formally, this may be treated as follows: 


Theorem: The only function D(p, qg) which satisfies 
(2) for all p with O < p < 1 and all real & is 
D(p, q) = ¢ log (p/q) (where the constant ¢ can be con- 
sidered a change of logarithmic base). 

Proof: With k = 0 in (2), it is seen that D(4, 4) = 0; 


961 


nd with k = —1, it is seen that D(p, gq) = —D(q, p). 
f£ D(p, q) is not identically zero (a degenerate case which 
orresponds to ¢ = 0), then there is a value p = a # 3 
uch that D(a, 1 — a) ¥ 0. For any p with 0 < p < 1, 
jhere is a real number k such that a = p*/(p* + q'). 
Specifically, 


and 


5 ire ed. 
he = log - DIS ie 
ing (2), we find that 

iI 
D(p, q) = k Dla, om a) 


= (tog e) (Da, 1 — a)/log 2). 


If we let 


a 


— a)/log aes 


| C= Day 
which is legitimate since a ~ 4 and 0 < a < 1, the 
theorem follows as stated. 

_ The principal “justification” for the axiom (2) is that it 
leads to the desired theory via an interesting route. 
However, there is a simpler-looking formulation of (2) 
which is fully equivalent, obtained by defining informa- 
tion density for odds rather than strictly for probabilities 
(2.e., normalized odds). Specifically, if the odds change 
from F : G to F" : G", the information density is multiplied 
by k: 


k DF :G) = D(F* :@. 


__ In order to generalize (1) to experiments with n possible 
outcomes, we must perform a line integration from the a 


priori vector of probabilities r = (r,, 2, «+: , Tn) to the 
posteriori vector of probabilities s = (s,, S2, --- , s,). The 
ntegrand is now the vector (log 2, log v2, --- , log z,), 


integrated with respect to the vector (dx, dx, --- , dx,). 


Golomb: A New Derivation of the Entropy Expressions 


167 
In this way, 
(og 2;,,, lorry, ~' » low 7s) dn er ea) 
= Dis: logs; — >, 7; logr; = H@) — H®) (3) 
as desired. 
Indeed, for the special case n = 2, (38) reduces to 
b,1-—b 
i! (log x,, log x2) -(dx,, da.) 
b 
=| flog 2 de = lop = ieee | 
b b x 
se ‘| [log x dx — log (1 — 2) dz] = / log 5 a 7 


which is the original integral (1). 

Essentially, (3) expresses the notion that when the odds 
on ” possible outcomes are 21: @: --- : 2%, then the local 
information flux (7.e., density) is represented by (log 2,, 
log #2, -:: , log 2,), there being a separate component 
for each of the possible outcomes. When this local be- 
havior is accumulated (7.e., integrated) from initial point 
r to terminal point s in probability space, the result is 
the total difference in information between r and s. 

For continuous information, a very proper passage to the 
continuous case of (3) gives the desired entropy expression. 
(Previous attempts based on the entropy rather than the 
information density have run into serious obstacles.) 
Specifically, letting 2; be replaced by the continuous 
probability distribution z(t), so that >> (log a;)-(dz;) is 
replaced by { (log x(t))(dx(t)), the analog of (3) is 


oo | xi(t) 
ihe jhe 
co 


[ x(t) log a(t) dt — is X(t) log x(t) dt 


ao 
a1(t) 


log a(t) ax(s | dt = / [u log wu — ujzicr) dt 


—o 


I 


H(x,(t)) — H(ao), (4) 


where again H(a(t)) is the expression for entropy recom- 
mended by Shannon. 


168 


IRE TRANSACTIONS ON INFORMATION THEORY 


The Use of Group Codes in Error Detection and 
Message Retransmission” 


W. R. COWELL} 


Summary—tThe paper considers group codes whose function is 
split between error correction arid error detection with retrans- 
mission. For a given code, the minimum error probability is obtained 
when retransmission occurs whenever an error is detected. An 
estimate of the redundancy added by retransmission is given and 
the behavior of retransmission channels as the length of the code 
words increases is studied. Most of the analysis is for the binary 
symmetric channel, although some of the results apply to more 
general channels. 


INTRODUCTION 


UCH of the recent work in coding theory for 
M binary digital channels has involved a search 
for codes which have good error detecting and 
correcting properties and yet may be instrumented easily. 
Many communication links on which such codes would 
be used permit the transfer of information in both direc- 
tions so that it is possible, just as in human conversation, 
for the receiver to request the retransmission of messages 
or parts of messages in which errors are detected. It is the 
purpose of this paper to consider such an error control 
plan in which a group code is used as the error detector. 
Much of the analysis is carried out for the binary sym- 
metric channel; we will note which results hold for more 
general channels. 


I. THe DECODER 


We will consider first a decoder which makes both 
correction decisions and retransmission decisions depend- 
ing on the received word. Suppose that the words of 
length n of a group code X are the input to a binary 
symmetric channel with transition probability p where 
p < 4. At the receiver is a decomposition into cosets’ 
relative to X of the group of all binary sequences of 
length n under componentwise modulo 2 addition. Set 
A of coset “‘leaders’’ is selected so that A contains exactly 
one member of each coset and includes the 0 sequence 
as the leader of X. Let S be a subset of A which contains 
0. The decoder operates as follows: A received sequence 
y is expressed (uniquely) as y = a + x where a is in A 
and xv is in X. If a isin S, y is decoded to wz. If a is not 
in S, the transmitter is instructed, via a reverse channel, 
to retransmit the code word. We will assume that the 
reverse channel operates without error, that retrans- 
missions are independent, and that a given word is re- 
transmitted until a word of form s + x for sin Sis received. 

If we define an error pattern of length as a sequence 
of binary digits in which 0 represents a correct digit and 
1 represents an error, then we observe that our decoder 
corrects those error patterns which are words of S and 

* Received by the PGIT, October 28, 1960. 


+ Bell Telphone Labs., Inec., Murray Hill, N. J. 
1 See [4] for a discussion of the algebraic properties of group codes. 


requests retransmission when the received word lies in a } 


coset whose leader is not in S. If S is the zero sequence 


alone, the retransmission occurs whenever the received — 


word is not a code word. If the weight of a sequence is 
the number of 1’s in the sequence, then the case where 


each element of A has minimal weight in its coset and — 
A gives the ‘maximum likelihood detector” studied — 


iS) = 
by Slepian [4] and others. 


Let w(x) denote the weight of the sequence x, and 
d(x, y) be the Hamming distance [3] from x to y. Note | 
that d(x, y) = w(x + y). We define n(z) = p?q™ "9 


where gq = 1 — p. Then the probability that y is observed 
at the receiver when x is transmitted is n(y + 2). 
Let 6, be the probability that x is retransmitted follow- 


ing a transmission of x. Thus, 1 — 6, is the probability 
that when x is transmitted we observe at the receiver a 
sequence of form s + x’ where s is in S and wv’ is in X. || 


This probability is 
l= = ye nls + 2’ + 2), 


BESS e.€ 


and is clearly independent of 7; we may write 


B= Se, ised) 


seS xveX 


Let D, be the probability that if x is transmitted | 
then the sequence observed at the receiver decodes into x. | 


This is simply the probability that the observed sequence 
is of the form s + x where s is in S and, therefore, 


Da= 2 als + + 2) = 2d, nls). 


Since this is independent of «, we shall write D, = D. 
Now the probability of decoding into x after exactly r 
retransmissions, given that 2 was transmitted, is 6’D, and 
so the probability of ultimately decoding into xv given 
that « was transmitted is 


Ds wD: 
1=0 
This is clearly independent of x and we may write 


D 


J = 6 D= 
2d ieee 


as the probability that a word is decoded correctly. — 


Henceforth, we will use the verb ‘‘to decode”’ in the sense 
“to decode ultimately, possibly after retransmissions 
have taken place.” 

Two special cases are worthy of note. If S = A, then 
6= O0and J = D = >. ..4 n(a), the sum over the coset 
leaders. If S is the 0 word alone, then 1 — 6 = >>,.x (x) 
and D = q*so J = q"/(1 — 86). 


61 Cowell: The Use of Group Codes in Error Detection and Message Retransmission 


heorem 1 


Given a group code X, let J* be the probability that 
word is decoded correctly when S is the zero word and 
t J be the corresponding probability for any other 
Dice ol S. then J* > J. 

Proof: The weight function satisfies the triangle in- 
huality: w(@ + y) < ae) + w(y) for all sequences x 
nd y. We recall that p < 4. Therefore, 


is a w(rty) n—w(rct+y) w(r)+wly) n-—wlr)—wly) 
nz + y) =p q 2 p q 
= ile w(x) n—w(r) wy) m—wly) __ n(x) ‘ ny) 
_ n 10) q as n 7 
q q 


Let 6 be the probability of retransmission for the 
hoice of S which contains more than the element 0 and 
* be the probability of retransmission when S is the 
pro word. Then, 


SI oe BRN n(x) 


seS xeX (SSS) DIELS 


1-—6= 


ee oe: 


| Thus, 
eS D 
L= > i= @ 


jt = = J, 

“hich was to be shown. 

_ Therefore, the probability of correct decoding is greatest 
or the case when retransmission occurs whenever a word 
ifferent from a code word is received. This is intuitively 
lausible when we realize that this is the case when the 
robability of retransmission is maximal and, hence, the 
edundancy introduced by retransmission is greatest. We 
roceed next to obtain a more precise formulation of this 
dded redundancy. 


Il. Copr EFFIcIENCY 


Suppose the code words of a group code X have m 
aformation places and f = n — m parity check places. 
Yefine the efficiency of X used as a corrector-detector as 
he ratio of m to the mean number of digits transmitted 
ntil a word is decoded at the receiver. Suppose further 
hat L > O digits are “lost’’ or ‘‘wasted’”’ whenever a 
stransmission takes place. We will think of these lost 
igits as adding to the total number of digits transmitted 
ntil a given word is decoded. In a real system that 
equires a certain time to “reset” or ‘turn around”’ 
reparatory to retransmission the digits that would have 
een transmitted during this reset time would be re- 
arded as lost as would digits used to re-establish syn- 
hronization, digits thrown away because of an inter- 
saved transmission pattern, etc. In determining L, we 
‘ill not include the digits of the retransmitted code 
ord itself. 


Theorem 2 


If a group code of length n has m information places 
nd is used with a corrector-detector such that the prob- 


169 


ability of retransmission is 6, and if L digits are lost on 
each retransmission, the efficiency of the code is 


i) 
~ n+ Le 


Proof: If exactly 7 retransmissions occur before a given 
word is decoded; 7.e., the word is decoded on the (¢ + 1) s¢ 
transmission, then the number of digits transmitted is 
@+1)n+ ib =n + ifn + L). The probability of 
exactly 7 retransmissions before decoding is 6° (1 — 0) and 
so the expected number of digits transmitted until de- 
coding is 


» 6'(1 — 6)[n + cn + L)] 


nil — @) Ye ++ DU — & dio’ 


I 


nt (n+ Di 


6 
Therefore, 
m m(1 — @) : 
e= =— as required. 
ntin+D, "Fie 
Corollary 


Assume @ > 0. Then a necessary and sufficient condi- 
tion that the efficiency of the corrector-detector of 
Theorem 2 be at least as great as the efficiency of a group 
code of length n + A with m information places which is 
used as a corrector only, is 


Pea ety 


Proof: The assertion may be stated as follows: 


ntL<as—* 


if and only if 


m(1 — 6) m 
n+Le ~n+aA 


This is easily obtained by simple manipulation of the 
inequalities. 

It may be remarked that Theorem 2 and its corollary 
do not make use of the binary symmetric property and, 
hence, could be stated so as to apply to more general 
channels. 

As a numerical example, let us take the group code 
with n = 8, m = 4 which is listed by Slepian [4] as the 
best corrector with these parameters. Take L = 100 and 
use the code as a detector only; z.e., S is the zero word. 
For several values of p, Table I lists @, e, and the smallest, 
positive integer A for which the inequality of the corollary 
is satisfied. The last column is the probability that a 
word is decoded in error when the code is used for de- 
tection and retransmission only. 


170 IRE TRANSACTIONS ON 


TABLE 
p fa) e A 1—J 
107 5.67 X 107 2.67 X 102 142 5.2 x 1073 
10% | 7:73 X10 1335 x 107 LO tesla aOes 
10-8 7.97-% 102% | 4,51 104 1 als 
10-4 3x 10-4 4.95 X 107 1 < 1077 
10-8 8 X 10-5 4.99 X 107 1 <a x2 


Ill. Lintrine Benavior as Worp LENGTH INCREASES 


In this section, we will be concerned with the case 
where S is the zero word and thus retransmission occurs 
whenever a word that is not a code word is observed at 
the receiver. We will call the ratio r = 1 — m/n the code 
redundancy. It should be noted that r < 1 — e because of 
the decrease in efficiency caused by retransmission. 

By an r-sequence of group codes we mean a sequence 
of group codes of lengths b, 2b, 3b, --- which have a, 
2a, 3a, --- check digits respectively where a and 6 are 
fixed, a ~ 0, b > aandr = a/b is in lowest terms. Thus, 
the code redundancy remains fixed while the lengths 
increase. The code of the sequence of length be will be 
referred to as the cth code of the sequence and desig- 
nated by X,. Notationally n be and m (b — ac. 

Now, for any r-sequence, let @, be the probability of 
retransmission for the cth code. Then, 


1 ye 6, —_ See x SZ, ("\pta, 


xreX¢ 7=0 


where we have replaced the check digits of each code word 
with 0’s in order to obtain an upper bound. Hence, 
n—m . m Us nt n—™m ac 
ea eed  (")p'g Fat 1 ae ae 


1=0 


Therefore, 
lim (1 — @,.) = 0 


cow 


ling’ @,2==- 1. 


ec 


so 
From Theorem 2, the efficiency of the cth code is 


(1 aa rd as 6.) 
= oRGR 
1-+ TG 


es 


) 


and consequently lim,.... e€, = 0. 

Thus, as the code words increase in length, the prob- 
ability of detecting errors and retransmitting imcreases 
toward 1 and the efficiency decreases toward 0. It is 
reasonable to ask whether there exist r-sequences such that 
the probability of correct decoding approaches 1 with 
increasing code length. This question is answered by the 
following theorem. (All logarithms are to the base 2.) 


Theorem 3 

If r > — log q, there exists an r-sequence such that 
lim... J, = 1 where J, is the probability that a word is 
decoded correctly for the cth code. Moreover, if the input 
is random and H(X, | X/) is the equivocation per word 
for the cth code then lim,... H(X, | X’) = 0. 

Proof: A group code of length n with m information 


INFORMATION THEORY July > 


places and f = ac check places is uniquely defined by a 
parity check matrix of 0’s and 1’s with m rows and f 
columns. There are 2”” such matrices and 2”’ codes. For 
many purposes these codes are not all distinct, but it 
suits our purposes to consider them as different here since — 
we wish to calculate the average of 1 — 6 over this set 
of codes. 

We need the following combinatorial result whose 
proof may be found elsewhere: Suppose that some se- | 
quence of m information digits which contains 1 in at 
least one place is given. If we write the corresponding 
sequence of check digits for each of the 2”” possible group — 
codes, we find that each of the 2’ possible sequences of 
length f occurs exactly 2°”-”’ times as a sequence of 
check digits. 

Now, sum 1 — @ over all possible group codes with m 4 
information places and f check places. First, sum | 
p’?q" °™ over all code words that have some fixed 
information sequence with 1 in at least one place. Let— 
the weight of the information places be 7 ¥ 0. ! 


3 Ce ae 


J 


as 2g oe (pia = Qe» ae . 

When the information places are all 0, the check places | 
are 0 for every group code, and therefore the sum of | 
pq" = q’ over these code words for all codes is | 
Dae | 
Having obtained the sum for each sequence of infor- | 
mation digits, we take the sum of these sums; 7.e., the | 
sum over the sequences of information digits: 


m n . m i I 
2G fe BB mass OM) 


m n m— - m i m-i m 
DA as wh gs ol (¥ ("\p'a =a 


PHN Gare le =a) 
Hence, the average of 1 — 6 over the possible group codes 
is 


qa? lat O le 


Now let r > — log q and construct an r-sequence as _ 
follows: For each c, select a code for which 1 — 8, is no 
more than the average calculated above. Such a code 
always exists, of course. Then, for this r-sequence, consider 


pe ee ng” 
EES ee ee 
AGW eee 
g +270 — 
1 
ene 2) ee 
n el ee 


2 See [2], ch. 7. 


961 
Evidently 


vnd 


; 1 é 
lim ae = Shia Gam, = ib 


ea | 


und the condition r > 
cherefore, 


—log q guarantees that 2"¢ > 1; 


lim (2-2 


e700 


97 ,\be 
= lim ae = © 

Hane IAOE 
Therefore, lim... n(1 — J.) = 0 and the first assertion 
of the theorem is immediate. 

To prove the second statement let us recall that the 
probability that a word y is observed at the receiver when 
v is transmitted is n(y + 2x). Therefore, the probability 
of decoding to the code word x’ when x is transmitted is 


hee Weis ee 


eral \2), = 2 O'n(a’ + x) = lars Pr | a’). 


For any code X, H(X | X’) is the expected value of 
) Q(x’) = — >> Pr(a | 2’) log Pr(z | zx’) 
zeX 


relative to the x’. By the randomness condition, the 
sxpected value is the unweighted average of Q(x’) over 
the code words. By the group property, Q(x’) is inde- 
pendent of x’ so that 


IOC @ ee 
Using the definition of n(x), this becomes 


F(X | X’) = ys 


-[w(z) log p + (n — w(x)) log g — log (1 — 9)] 


_ log g — log p Ne ha eee (2) 


xzeX 


(log g — log p)G — log J, 


G = Gy LO. 


We note parenthetically that G is the expected number 
of digits in error per received word. The property of G 
¢ importance in the present context is that G is domi- 
nated by n(1 — J) for 


1 n 
| | = <é = ae 5 
G Tag Lh vOn@® <7 
rex 
— n ——. = 1 = _ ng" =a —_ 5 
Soa ph 6—g7)=n ae n(1 5): 


ence, lim... G, = 0 for the r-sequence constructed above. 


Cowell: The Use of Group Codes in Error Detection and Message Retransmission 


171 
Thus, from the above formula for H(X | X’), 
lim ice. | = 0. 


eo 


This completes the proof. 


IV. Some UnsoLtvepD PROBLEMS 


We mention finally several open questions which may 
be worthy of investigation. 


1) For given n and m, what is the best retransmission 
code (in the sense of maximizing J)? This question is 
probably very difficult to answer in general but is of 
practical significance for small m and n. The best cor- 
rection code is not necessarily the best retransmission 
code. For example, a certain code with n = 7, m = 3 
mentioned by Slepian® has a higher probability of 
correct decoding when used as a retransmission code 
than does the ‘‘best’’ group code (in the correction 
sense) when the latter is used as a retransmission code. 

2) Can we choose an r-sequence of codes X,, and define 
a set of coset leaders S, for each c so that lim,.. 6, 
is neither 1 nor 0 and yet lim,... J, = 1? 

3) In Thecrem 3 is —log g the best bound on r? We 
may remark that the work of Elias [1] together with our 
Theorem 1 guarantees that when r > —p log p — q 
log q there is an r sequence such that J — 1. However, 
Theorem 3 is stronger, not only because of the result 
on equivocation but also because 


—log q < —plogp = q log ¢- 


4) For a given r-sequence define 
R(c) = e. — * HX. bX7) 


as the effective rate per digit of transmitting informa- 
tion for the cth code. It is easy to show that lim,..., 
R(c) = 0. Can one find the maxima of R(c) with respect 
to c? If so, this could lead to a definition of optimum 
code length for the given r-sequence. 

5) The dominating practical question, of course, is 
how to combine the assurances of Theorem 3 with the 
estimate of efficiency of Theorem 2 and construct codes 
and coding equipment so that J > 1 — e and yet the 
efficiency, complexity of the instruments, and cost 
are tolerable when ¢ is reasonably small. Any attempt 
to answer this question for a particular system intro- 
duces variables which we have not considered here. 
However, Theorem 1 indicates that retransmission as 
a method of error control deserves further practical 
attention. 


REFERENCES 


{1] P. Elias, ‘Coding for noisy channels,’ 1955 IRE Natronan 
CONVENTION ReEcoRD, pt. 4, pp. 37-46. 

[2] A. Feinstein, “Foundations of information theory,’ McGraw- 
Hill Book Co., Inc., New York, N. Y.; 1958. 

[3] R. W. Hamming, ‘‘Error detecting and error correcting codes,” 
Bell Sys. Tech. J., vol. 29, pp. 147-160; April, 1950. 

[4] D. Slepian, “A class of binary signaling alphabets,’ Bell Sys. 
Tech. J., vol. 35, pp. 203-234; January, 1956. 


3 See [4], p. 213. 


IRE TRANSACTIONS ON INFORMATION THEORY 


July 


On the Factorization of Rational Matrices* 


D. C. YOULAT, SENIOR MEMBER, IRE 


Summary—Many problems in electrical engineering, such as 
the synthesis of linear n ports and the detection and filtration of 
multivariable systems corrupted -by stationary additive noise, 
depend for their successful solution upon the factorization of a 
matrix-valued function of a complex variable p. 

This paper presents several algorithms for affecting such decom- 
positions for the class of rational matrices G(p), i.e., matrices 
whose entries are ratios of polynomials in p. The methods employed 
are elementary in nature and center around the Smith canonic 
form of a polynomial matrix. Several nontrivial examples are 
worked out in detail to illustrate the theory. 


I. INTRODUCTION 


T is well known [1]-[3] that many problems involving 
the detection and filtration of multivariable systems 
contaminated by stationary additive noise can be 

reduced to the study of a matrix Wiener-Hopf integral 
equation of the type 


i) KG WO) dr eG) er OF (1) 
0 

where K(t) is the covariance matrix of the noise, e(¢) is 
a deterministic column-vector function prescribed in ad- 
vance by the known datum, and W(r) is the un- 
known column vector of filter weighting functions W,(7), 
W.(7), , W,(7). In most practical cases, the noise 
possesses a rational absolutely continuous spectral density 
matrix: 


=— G(pe' dp, j= V-1, (2) 
21] J—j0 
where 


+ 00 


Coe i, ' Ke” di (3) 


isn X n and has rational entries. Moreover [4], 


1) G(p) = G(p). 

2) G(—p) = Gp). 

3) b*G(jw)b > O for every n-vector b and every real 
finite w. For short, Gyw) = 0. 


To solve (1) by the Wiener-Hopf technique it suffices 
to exhibit a factorization of G(p) of the form (A’ denotes 
the transpose of the matrix A) 


G(p) = H’(—p)H(p) (4) 


* Received by the PGIT, October 18, 1960. The work reported 
herein was sponsored by the AF Cambridge Res. Ctr., L. G. 
Hanscom Field, Bedford, Mass., under Contract No. AF—19(604)— 
4143, and presented as Res. Rept. PIBMRI 855-60. 

+ Microwave Res. Inst., Polytechnic Inst. of Brooklyn, Brooklyn, 
Neve 


a 


with the following properties [11], [2]: 


1) H(p) is rational and analytic together with its 
inverse H *(p) in a right half-plane Re p > —u, 
je > ©: 

2) H(p) is real; 2.e., H(p) = H(p). 


The object of this paper is to describe a specific algorithm 
for affecting such decompositions for the class of rational 
matrices and to consider some related questions. 


II. Pretiminary Novation AND DEFINITIONS 


Let A be an arbitrary matrix. Then A’, A, A*, A’ 
and | A | denote the transpose, the complex conjugate, 
the adjoint (A’), the inverse and the determinant of A, 
respectively. 

A diagonal matrix A with diagonal elements yy, 
Mo, °° Mais written as A = diag [4,, ue, --- , wu]. Column 
vectors are represented by x, y, etc., or in the alternative 
fashion xX = (a, ®, --: , 2)’ whenever it is desirable to 
indicate the components explicitely; 1,, O, and O,,,, are, 
in the same order, the n X n identity matrix, the n-di- 
mensional zero vector and the n X m zero matrix. 

A matrix A(p) is polynomial if each of its entries is a 
polynomial in p. A(p) is rational if each of its elements is 
rational in p; 2.e., 


b,.(p) 
Gr(P) A 


fe.(p) and g,,(p) being polynomials. 

A(p) is said to be real if A(p) = A(p). In particular, 
A(jw) = A(—jw) for all real w. 

The non-negative integer r(A) is the normal rank of 
the rational matrix A(p) if 1) there exists at least one 
subminor of order r which does not vanish identically, 
and 2) all minors of order greater than r vanish identically. 
Clearly the normal rank of a rational matrix can decrease 
at most on a finite set of points in the p plane. 

A nonsquare matrix does not possess an inverse in the 
ordinary sense. However, it may have either a right or 
left inverse. Thus if A is m X n, A possesses a right 
inverse A", such that AA~' = 1,, if and only if m < n 
and (A) =m. 

An elementary polynomial matrix is a polynomial 
matrix possessing either a right or left polynomial inverse. 
A square matrix A(p) is elementary if and only if its 
determinant is a constant independent of p. 

A(p) is analytic in a region of the p plane if all its 
entries are analytic in this region. 

The point po is a pole of A(p) if some element of A (p) 
has a pole at p = po. 

If po is a pole of the rational matrix A(p), each element — 
of A may be expanded in partial fractions and after 


(A) = 


1961 


collecting all those terms having poles at p» there is ob- 
tained for po # ©, 


A(p) =(p—p) A, + (p—p) Ayr 
Pp aye ah a Ee A) 


where A,(po) is finite, A, ~ O and the Arg Unto eaks 
are constant matrices. If pp = ©, (p — po)‘ is replaced 
pve t= 2 <b, Alot Ao(p), Aa, - > , A, are uniquely 
defined by their construction from A (p). 

Def. 1: If A(p) is given by (5), then k is the order of 
the pole of A(p) at p = po. 

Def. 2: A rational matrix A(p) is said to be paracon- 
jugate hermetian if A*(p) = A(—j). Hence, on the real- 
frequency axis p = jw, A*(jw) = A(jw) and A(jw) is 
hermetian in the ordinary sense. For real A(p), 
A*(—p) = A’(—p) and the paraconjugate condition 
simplifies to A’(—p) = A(p). A real paraconjugate 
hermetian matrix is called para-hermetian. 

Def. 3: A rational m X n matrix A(p) is said to be 
paraconjugate unitary if either A*(—p)A(p) = 1,, or 
A(p)A*(—p) = 1,,, or both. On the real-frequency axis 
ip = jo, A*(—p) = A*(jw) and A(jw) is unitary in the 
usual sense. For real A(p) the paraconjugate unitary condi- 
ition simplifies to A’(—p)A(p) = 1, or A(p)A’(—p) = In. 
A real paraconjugate unitary matrix is para-unitary. 

It is most convenient for typographical reasons to let 


A,(p) = A*(—p). 


_ This notation is used throughout the remainder of the 
paper. Observe that A,,(p) = A(p) and (AB), = B,A,. 
A scalar function f(p) satisfying f(—p) = f(p) is called 
paraconjugate. If f(p) is real and paraconjugate, it is 
actually even. 

Def. 4: A paraconjugate unitary matrix A(p) is said 
to be regular if it is analytic in the right-half plane 
wep. > 0. 

The structure of rational matrices is the subject of the 
classic Smith-MecMillan lemma. 

Smith-McMillan Lemma [5], [8]: Let G(p) be an m X n 
rational matrix of normal rank r. Then there exist two 
elementary polynomial matrices C(p) and F(p) of orders 
m xX randr X n, respectively, such that 


A, sD) 
yp) Y?) 


(6) 


C(p) diag Ee ex) 


GD) Vilp) ’ olp)? 


(CUO ies 
where 


a) e,(p) and y¥,(p) are relatively prime polynomials 
with unit leading coefficients, 1 < k < 7; 

b) Bach e,(p) divides ¢.:1(p), 1 < & < r — 1, and 
each y,(p) is a factor of y-1(p), 2 <1 <7; 

c) The diagonal matrix D(p) appearing in (6) is, 
subject to a) and b), uniquely determined by G(p). 
It is, in fact, canonic; 

d) If G(p) is real, the e’s, y's, C(p) and F(p) may also 
be chosen real; 


Youla: On the Factorization of Rational Matrices 


173 


e) The finite point p = pp is a pole of G(p) of order 
k if and only if it is a zero of ¥,(p) of order k; 

f) The order of p = © asa pole of G(p) is the same as 
the order of z = 0 as a pole of G(z) = G(1/2). 


A rational matrix is said to be canonic if it is square, 
nonsingular and diagonal with the properties a) and b) 
listed in the Smith-McMillan lemma. The rational func- 
tions €,/W,, 2/2, --: , e,/W, are generalized “invariant 
factors’ of G(p). For the sake of brevity, the above 
lemma is referred to as the $S.M. lemma. 


IIT. ANALYSIS 


With these preliminaries out of the way, it is possible 
to begin the analysis leading up to the main factorization 
theorems. From this point on, all matrices are assumed 
to be rational unless stated explicitly otherwise. 

Lemma 1: A matrix G(p) is analytic in the entire p plane 
together with its inverse (either right, left or both) if 
and only if it is an elementary polynomial matrix. 

Proof: The ‘‘if”’ part is obvious. According to e) of the 
Smith-MecMillan lemma, the analyticity of G(p) for all 
p implies that ¥,(p) is a constant. Thus, by b) all y’s 
are constant. Now note that the existence of a left or 
right inverse for A implies that either n = r or m = 1, 
respectively, and a little thought should convince the 
reader that the canonic form for G *(p) is 


RAD esa) we) | 
diag | #1? eae ely 


) 


The analyticity of G"(p) in the entire p plane implies 
that e,(p) = constant. Invoking b) again, all e’s are 
constant and G(p) is the product of three elementary 
polynomial matrices, of rank 7, Q.E.D. 

Lemma 2: A paraconjugate unitary matrix is bounded 
at infinity and analytic on the entire closed p = jw axis. 

Proof: Suppose G(p) is m X n and G,(p)G(p) = 1,. 
Thus G*(jw)G(jw) = 1, and, writing out the diagonal 
elements in expanded form, 


Den) P=1, G= 12,2. 


a Gre Je) ie els 


for all w, Q.E.D. 

Lemma 3: The only regular paraconjugate unitary 
matrices G(p) with analytic inverses in Re p > O are 
constant unitary matrices. If G(p) is para-unitary it is 
real-orthogonal. 

Proof: Suppose G,(p)G(p) = 1,, say, where G(p) is a 
regular m X n paraconjugate unitary matrix. The an- 
alyticity of its left inverse in Re p > O implies that of 
G(—p) in the same region and therefore that of G(j) 
in Re p < 0. Now the poles of G(p) are the complex 
conjugates of those of G(p). Hence G(p) is analytic in the 
entire p plane and bounded at infinity (Lemma 2). By 
Liouville’s theorem it must be a constant unitary matrix. 
If G(p) is real it is real-orthogonal, Q.E.D. 


=, 3 b= 152, 2-0), 


174 
Def. 5:° Let G(p) be an m X n rational matrix of normal 


rank r. A decomposition of the form 


G(p) = A(p) A(p)B(p) (7) 


is said to be a left-standard factorization if 


a,) A(p) isr X r, canonic and analytic together with 
its inverse in the entire p plane with the possible 
exception of a finite number of points on the p = jw 
AXIS; 

a.) A(p) is m X r and analytic together with its left 
inverse in Re p < 0; 

a;) B(p) isr X n and analytic together with its nght 
inverse in Re p = 0. 


Interchanging A and B gives rise to a right-standard 
factorization. Obviously any left-standard factorization 
of G(p) generates a right-standard factorization of G’(p), 
G*(p) and G(—p). For example, G’(p) = B’(p) A(p)A'(p), 
etc. 

It follows from the Smith-McMillan lemma that any 
rational matrix G(p) possesses a left- or right-standard 
factorization. For let G(p) = C(p) D(p)F(p) where C 
and F are elementary and D canonic. By factoring the 
e’s and y’s appearing in the diagonal elements of D(p) 
into the product of three polynomials, the first without 
zeros in Re p < 0, the second without zeros in Re p ¥ 0, 
and the third without zeros in Re p > 0, it is possible 
to write D(p) = D (p) A(p) D*(p): D (p) and its inverse 
are analytic in Re p < 0, A(p) and A“‘(p) in Re p# 0 
and D*(p) and its inverse in Re p > 0. Now, choosing 
A(p) = C(p) D(p) and Bip) = D°(p)F(p), it is im- 
mediate that the desired breakdown is given by G = A: AB, 
Q.E.D. 

Suppose that G(p) admits two left-standard factori- 
zations 


G = A AB= A, A,B,. (8) 


Then 


Ap Ar AgAt=! BiBee (9) 
By definition the right-hand side of (9) is analytic in 
Re p > O and the left-hand side in Re p < 0. Thus B,B™' 
is analytic in the entire p plane. According to (8) the 
inverse of B,B™* is A-*A~*A, A, = BBj" and is there- 
fore also analytic in the entire p plane. By Lemma 1, 
B,B™ is an elementary r X r polynomial matrix N(p). 
Similarly, Ay’A is an r X r elementary polynomial 
matrix M(p). From (8), 


M(p) A(p)N"(p) = Ai(p). 


1 The reader is warned that this definition is not the same as 
that given in Goldberg and Krein [11]. 


IRE TRANSACTIONS ON 


INFORMATION THEORY July 


Since A(p) and A,(p) are both canonic, A(p) = Aj(p) 
by the 8.M. lemma. Thus, 


M(p) = A(p)N(p) A"(p), (10) 
Bi(p) = N(p)B(p), (11) 
A\(p) = A(p) A(p)N(p) Ap) = AGM "(p) —— 12) 


These results are summarized in Theorem 1. 
Theorem 1: Let G(p) possess the two left-standard 
factorizations G = A AB = A, A,B,. Then, 


a) A(p) = Ai(p); 
b) Ai(p) = A(p)M™*(p) and B,(p) = 


mentary polynomial matrices which transform A(p) 
into itself, viz, M(p) A(p)N~*(p) = A(p). 


Corollary: The canonic matrix A(p) appearing in either 
a left-standard or right-standard factorization of an 
m X n matrix G(p) of normal rank r(G@) is equal to the 
r X r identity matrix 1, if and only if G(p) is analytic 
and r(@) is constant on the entire finite p = jw axis. 
In this case, if AB and A,B, are any two standard factori- 
zations of G, A,(p) = A(p)N*(p) and B,(p) = N(p)B(p), 
N(p) being an arbitrary r X r elementary polynomial 
matrix. 


Proof: The “if”? part is immediate. Now the analyticity | 
of G(p) on the p = jw axis implies that all the denominator | 
polynomials in A(p) are unity. This, in turn, leads to the © 
conclusion that 7(@) is constant on p = jw only if all | 


numerator polynomials in A(p) are unity. Thus A(p) = 1,. 
The remaining statements are a consequence of Theorem 
1, part 6), QO-E.D: 

For paraconjugate hermetian matrices (see Def. 2), 
M and N are tied together in a very specific manner. 


Thus, suppose G(p) = G,(p), and let G = A AB bea | 


left-standard factorization. Then G(p) = G,(p) = 
B,(p) 4,(p)A,(p). Except, perhaps, for the signs of some 
of its diagonal elements, A,(p) is also canonic, whence, 
from Theorem 1, 


Ap) => A), (13) 
where 
Da diagile tes + rly = eel (oS iy 2ypceens Pe 
In other words, 
G(p) = B,(p)= A(p)A,(p) (13a) 


a B,(p) A, (p)A,,(p) 


is also a left-standard factorization. Invoking Theorem 
1 again, 


A,(p) = N(p)B(p), (14) 
B,(p) = AM. (15) 
-. A,(p) = N(p)=M;,"(p) A,(p). (16) 


N(p)B(p), © 
where M(p) and N'(p) are any two r X r ele-@ 


961 
ince A,(p) has a right inverse, 
N(p) = M,(p)2 (17) 


n which M(p) is any r X r elementary polynomial matrix 
satisfying [see (10)] 


A(p)M ,(p) = M(p) A,(p). (18) 


According to (13), each diagonal element of A(p) is 
either a paraconjugate or skew-paraconjugate rational 
unction; 7.e., either Ay(p) = Au(—p), or An(p) = 
An (— Dp), (k =F UG 2, cia) r). From (10), | N(p) | =a 
| M(p) |. Thus, by (17), | N@) | = | 2 |-| NM) | = 
+ | N(p) | depending on whether | > | = +1 and | N(p) | 
s either purely real or purely imaginary. When G(p) is 
para-hermetian, | N(p) | is real, | > | = +1 and the 
qaumber of odd rational functions appearing in, A(p) is 
even. The above statements can be made much more 
orecise for the class of non-negative paraconjugate 
nermetian matrices. 

Lemma 4: Let G(p) be an n X n _ paraconjugate 
nermetian matrix of normal rank r which is non-negative 
on the real-frequency axis; 7.e., b*G(jw)b > O for every 
n-vector b and every real w. Then 1) its S.M. canonic 
form satisfies D,(p) = 2 D(p), and 2) the real-frequency 
zeros and poles of the diagonal elements of D(p) are of 
even multiplicity. 

Proof: Let G(p) = C(p) D(p)F(p) be the S.M. form 
of G(p). Since G(p) = G,(p), C(p) D(p)F(p) = 
# (p) D,(p)C,(p). Hence, by a previous argument, 
D,(p) = x D(p) where = is anr X r diagonal matrix 
whose diagonal elements are either +1, and therefore 
each diagonal element of D(p) is either paraconjugate or 
skew-paraconjugate. Thus any zero or pole po is ac- 
companied by a zero or pole — jo, and therefore 


D(p) = 2dr, (p) A(p)A(p) 


(19) 
and 


A,(p) = 22 Alp), (19a) 


where \(p) is rational, diagonal and analytic together 
with its inverse in Re p > 0; A(p) is canonic, the zeros 
and poles of its diagonal elements being entirely confined 
to the p = jw axis. 

Since all the principal minors of G(jw) are non-negative, 
any real-frequency pole of G(p) of order k must be a 
pole of order k of at least one diagonal element 4,,,,(p). 
Under the assumption that the numerators and de- 
nominators of all entries in G(p) are relatively prime, 
Imm(jw) = 0 implies that any one of its poles on p = jw 
s of even multiplicity; 7.e., k is always an even integer 
und the denominator of A,,(p) is the square of a monic 
s0lynomial which is either paraconjugate or skew-para- 
conjugate. 

Denote the real-frequency poles of G(p) by p = 0, 
jo1, Jo, -°°, jos, and let Io, , lb, ---, J,, be their highest 
‘espective multiplicities in any nondiagonal element. 


Youla: On the Factorization of Rational Matrices 


175 


Define the polynomial u(p) by 


II pp — jex)’*. 


a=1 


wip) = 


Clearly, the only elements of G(p) = uG possessing real- 
frequency poles are diagonal. Set D(p) = diag [e:/v, 
€,/W2, +++ , e,/W,|. The S.M. canonic form for yG is 


D = ding (#9. 4], 
(p) fo} Wi Vy, V, 
where ¥, = ¥,/u, and é,/P, is ue; /W; in lowest normalized 


terms, (i = 2,3, ---,7r);¥; differs from y,, (¢ = 2,3, --: ,7), 
if and only if » and y; have a factor in common. Now 


(20) 


let ya ee ye ee > >? be the orders of ju,, 
= 0,1, -+> , 8; wo. = 0), as a zero of Wi, ¥2, --- Vr, 
respectively. Similarly, let of” > of? >, --- , = a,” 


be the orders of jw; as a pole of the diagonal elements 
of wG arranged in nonincreasing sequence. By Theorem 
5.29 of McMillan [5], 


(2) (i) 


7? = of, (b= (21) 


f Se alle mS): 


Thus the order of jw; as a zero of ¥;,(p), @ = 0,1, --- , 3; 
k = 1,2, --- , r), is equal to its order as a pole of some 
diagonal element of G(p), and is therefore an even integer. 
To sum up, every denominator appearing in A(p) is the 
square of a monic polynomial which is either paraconju- 
gate or skew-paraconjugate. 

As regards the numerators of A(p) note that from 
(13a) and (14), 


[Bz'(p)G(p)B'(p))’ = N-“(p) A"(p) 2, (23) 


so that A '(p) is real-frequency canonic for the para- 
conjugate hermetian matrix appearing on the left-hand 
side of (23). This matrix is also non-negative on p = jw. 
Consequently, all denominators of A ‘(p), and therefore 
all numerators of A(p), are the squares of either para- 
conjugate or skew-paraconjugate functions. Gathering 
everything together, A,(p)= A(p)=6"(p), 0,(p) = =34(p), 
D,.(p) = D(p), 2 = 2 =1,, and 
D(p) = 2sd,(p)94(p) (pM) ; (24) 
D4. = D123, A(p) is diagonal and analytic with its inverse 
in Re p > 0, 0(p) is diagonal and analytic with its inverse 
in Re p ¥ O and 2, 23 and Y, arer X r diagonal matrices 
whose diagonal elements are either +1, Q.E.D. 
Enough material is now on hand for the main theorem. 
Theorem 2: Let G(p) = G,(p) be a rational n XK n 
paraconjugate hermetian matrix of normal rank r which 
is non-negative on the real-frequency axis p = jw. Then, 
there exists an r X n rational matrix H(p) such that 


ai) G(p) = H,(p)H(p). 
a.) H(p) and H ‘(p), its right inverse, are both analytic 
in Rep 10: 


176 


a;) H(p) is unique up to within a constant, unitary 
r X r matrix multiplier on the left; ¢.e., if H,(p) 
also satisfies a, and a., H,(p) = TH(p) where T 
isr X r, constant and satisfies T*7T = 1,. 

Any factorization of the form G(p) = L,(p)L(p) 
in which L(p) is r X n, rational and analytic in 
Re p > 0, is given by Lip) = V(p)H(p), V(p) 
being an arbitrary, rational, regular r X r para- 
conjugate unitary matrix. 

If G(p) is analytic on the finite p = jw axis, H(p) 
is analytic in a right semi-infinite strip Re p > —r7, 
tT > 0: 

If G(p) is analytic and r(@) is invariant on the 
finite p = jw axis, H-‘(p) is analytic in a right 
semi-infinite strip Re p > —1, 7, > 0. 

If G(p) is real, H(p) and V(p) are real and T is 
real-orthogonal. 


(4) 


ds) 
(Ig) 


any 


Proof: Consider statement a3) first, and let H(p) and 
H,(p) be two matrices satisfying a,) and a,). Then 


H,(p)H(p) = H,,(p)H,(p) 
“ Vip) Vp) = 1, 


where V(p) = H,(p)H™*(p) is obviously analytic in 
Re p > O; 2.e., V(p) is a regular r X r paraconjugate 
unitary matrix. But from (25), 


V@) = Hi.@)H,@), 


and is therefore also analytic in Re p < 0. By Lemma 38, 
V(p) is a constant r X r unitary matrix T. Hence 
H(p) = TH(p), Q.E.D. 

The proof of a.) proceeds along the same lines and is 
omitted. 

To prove the existence of an H(p) with the properties 
a,) and az) is of course the difficult part. 

Step 1: Reduce G(p) to the S8.M. canonic form. One 
procedure for doing this is the following: Assuming that 
all entries in G are relatively prime, write 


G(p) = g(p)G(p), 


where g(p) is the normalized lowest common multiple 
of all denominators appearing in G(p) and G(p) is a 
polynomial matrix. It is easily shown [5] that g(p) = ¥.(p). 
G(p) is now reduced to its Smith form by the technique 
described in Gantmacher [8]; 7.e., 


G(p) = C(p)E(@) Fp), 


where C(p) and F(p) are n X n elementary polynomial 
matrices and 


E(p) a diag [é,(p), é(p), Sr ep) 0, Ws, sere 0]. 


The é@’s are monic polynomials arranged so that @; divides 
C5215 (a = 1. 2, ES ) | iD). Let 


as 
es el 


r 


(25) 
(26) 


(27) 


(28) 


(29) 


(30) 


(31) 


IRE TRANSACTIONS ON 


INFORMATION THEORY 


Then C(p) = C(p)J and F(p) = J'F(p) are n X r and 
r X n elementary polynomial matrices, respectively. 
Moreover, 


July 


G(p) = C(p)E(p)F(p), 
where 


Ep) = diag |é,, é2, “=~ 7 é,]- (33) 


If now D(p) is defined by 
D(p = ding &,-.. 2], 
”) gy ] 


each element being normalized and in lowest terms, 
é, = &,,¥%1 = g and the S.M. form for G(p) is G = CDF. 
Step 2: According to Lemma 4, 


D(p) = =r, (p) A(p)MP), 
where 
1) A\(p) is r X 7, diagonal and analytic, together with 
\*(p) in Re p > 0; 


2) A,(p) = A(p) = 6(p) in which all diagonal ele- 
ments of @(p) are either paraconjugate or skew- 


paraconjugate. Furthermore, A(p) is canonic and — 


analytic in Re p ¥ 0; 


3) 2 is an r X r diagonal matrix with diagonal ele- 


ments +1. 


Let 
A(p) = C(p)2A, (p), 


B(p) = X(p)F(p). 


(36) 
(37) 
Then 

G(p) = A(p) A(p)B(p) 


is a left-standard factorization of G(p). 
Step 3: By (18a) and (14) of the corollary to Lemma 3, 


By (p)G(p)B“(p) = O¢p)N(p) = A(p)N(p), 


where N(p) = (n,,) is an r X r elementary polynomial 
matrix such that [see (17) and (18)] 


(38) 


(39) 


A(p)N(p) A"(p) = M(p) (40) 
is also elementary. From (39), 
L,(p)G(p)I(p) = 6,(p)N(p) 0 "(p), (41) 
I(p) = Bp) 0"). (42) | 
Hence 
M(p) = 6,(p)N(p) 0 "(p) (43) 


is r X 7’, paraconjugate hermetian and non-negative on 
the p = jw axis. Actually a good deal more is true. Observe 
that (40) and the canonic nature of A(p) imply that n,,(p) 
is divisible by the polynomial A,.(p)/A,,(p), k > r. Since 


(32) 


(34) 


(35) 


961 


| re (P) = 6.(p), (k ae 3 2, eat 
y the polynomial 


, 1), M-x(p) must be divisible 


9 


8:(p) 
0;(p) ’ 


nd, a forteort, by f,.(p) = 0.(p)/0,(p) = +6..(p)/0,(p), 
> r. This suffices to establish that M(p) is polynomial. 
ut | M(p) | = + | N(p) | = constant; i.e, M(p) is a 
Dositive paraconjugate hermetian r xX r elementary 
oolynomial matrix. The next step is to demonstrate that 


M(p).= P,(p)P(p), 


(p) being an r X r elementary polynomial matrix. 
fter this is achieved, the desired factorization for G(p) 
is obtained as G = H,(p)H(p) with 


H(p) = P(p)6(p)B(p) 


f,(p) = ber 


(44) 


= P(p)0(p)p) F(p) (45) 
= P(p) D*(p)F(p) 
where 
D*(p) = O(p)Xp). (46) 


By straightforward algebra, 

H,(p)H(p) = F,(p)d,(p) 9,(p)P,(p)P(p) (p)(p) F(p) 
| = F,(p)d,(p) 04) N (pp) F(p) 

| B,(p) A(p)N(@)BO) 

Gi p) 


Lhe ingenious algorithm to be described in Step 4 for 
factoring a positive, elementary polynomial paraconju- 
wate hermetian matrix is due to Oono and Yasuura and 
first appeared in a now classic paper [6] dealing with 
he synthesis of passive n ports. Another such application 
may be found in Youla [7]. 

_ Step 4: Because of the positive nature of M (ju), all its 
diagonal elements are paraconjugate and positive on 
p = jw. Let 26, < 26, <,--- , < 26, be the degrees of 
these diagonal entries arranged in nondecreasing order. 
The 6’s are non-negative integers. Again invoking the 
positive character of M(ju), it follows that no element in 
M(p) has degree exceeding 26,. Thus 6, = 0 if and only 
if M(p) is a constant hermetian positive definite r X r 
matrix, in which case it can be written as P*P by any 
number of standard techniques. The Gauss algorithm is 
as good as any [8]. Excluding this relatively trivial situ- 
tion, 6, > 0. 

By interchanging rows and columns it may be assumed 
that the diagonal elements (M).1, (M2, -:° , (M1), 
possess the degrees 26,,, 26, , 26,,, respectively. 
Call the rearranged matrix M,(p). Then there exists a 
permutation matrix Q such that 


M,(p) = Q'M(p)Q. 


M,(p) is also elementary, paraconjugate hermetian and 
positive. 


I 


(47) 


Youla: On the Factorization of Rational Matrices 


170g 


Define a nonincreasing sequence of non-negative inte- 


gers 01, 02, °°: , o, by 
g, = 6, — 6, (ye We eas a7) (48) 
and the r X r diagonal matrix Q(p) by 
Op) = diag [p™, p”, --- , p”']. (49) 
Note that co, = 0. Ther X r matrix 
M(p) = 2,(p) Mp) 2) (50) 


is polynomial, paraconjugate hermetian and _ positive. 
Moreover, all its diagonal elements have the same degree 
26,. It is clear that 


| Mp) | = Op”), (51) 
o=o,to+--+ $6.4. (52) 

From (48) 
ESAG SS Wicoh. (53) 


M.(p) may be expanded as a polynomial in p with 
constant matrix coefficients: 


Mp) = T. + pT, +--+ + pT 2... (54) 
Since M;.(p) = M2@), 18 =] ee 
— TR ayy or yy Dy =] ST onde i, =e new iee mame 


constant hermetian or skew-hermetian r X r matrices. 
The important observation is that 7’,;, is singular; 2.e., 
| Ts5, | = 0, for otherwise (54) would yield 


| Mp) | = OW"), 


which contradicts (51) and (53). This deduction implies 
that T.;, contains a principal minor T of order »y X », 
1 < » < 1, located in its upper left-hand corner (Fig. 1) 


We iG se 


je | IP! lisse 


Pe, ypt+l 


Ik |, sé 
T5,= a 


(6 sci v1] 
Te NT ce GA) 


x = a v-dimensional column vector. 
I = nonsingular hermetian vy X v matrix. 


iP 
x 


T= 


x 
| = (vy +1) X(» + 1)singular hermetian 


Usps spel matrix. 


Fig. 1—Structure of T25,. 


which is nonsingular and such that the minor I created 
by adding the (v + 1)th row and column to T is singular: 
Suppose this assertion is false. Then since the (1, 1) 
element in 7',;, is not zero (remember that all diagonal 
entries in M.(p) have degree 26,), the upper left-hand 
corner 2, xX 23 oO eas St XX Gaminors of 1/5; musu all 
be nonsingular. But the last minor is precisely | 72;, | = 0, 
a contradiction, Q.E.D. 


178 


By adding a proper linear combination of the first » 
rows of 7,5, to the (v + 1)th row and the conjugate linear 
combination of the first » columns to the (v + 1)th, 
t,.a,ye1 18 reduced to zero, and no other diagonal element 
is affected. Hence, for the correct choice of constant 
r X r nonsingular matrix Q,, 


Tine = OAT 25,.Q) (55) 


has a zero element in the (vy + 1, vy + 1) place. From (54), 
25, 


Tp) = CULM = LCA?’ 


(56) 


has a diagonal element in the (v + 1, vy + 1) position of 
degree less than 26,. 
The matrix 


M(p) = 2,"(p)Ma(p) 2p) (57) 


is paraconjugate hermetian, positive and elementary. 
Only the latter statement needs proof. According to (50), 
(M,),; is divisible by p%*”, and according to (56) and 
the definition of Q,, M,(p) differs from M7,(p) only in its 
(vy + 1)th row and column. More specifically, 


(Uae = CUB Paes is De aM) ai, (58) 
(k = 1, 2,--- , 7), the a’s being scalars. By construction 
o, > 02 > -+::, So,. Thus every term on the right-hand 
side of7(58): is divisible: byop? 7), "(Gay oP ar) 
The same considerations apply to the (v + 1)th row, 
whence, for all k and 1, (1/;),, is divisible by p’**"', and 
M.(p) is a polynomial matrix. Since 


| M.(p) | = +] Q,0,9? |-| M(p) | = 


M,(p) is elementary, Q.E.D. 

But M,(p) is simpler than 17,(p) because the degree 
of its (v + 1, vy + 1) entry is at least two less than the 
one in the same place in the latter, while all other corre- 
sponding diagonal elements have the same degree. Con- 
sequently, after one cycle of the algorithm, 


M(p) = R,.(p)M(p)Ri(p), 


constant, 


(59) 
where 


Ri(p) = A@)Qr'2"(p)Q™* (60) 


is an elementary polynomial matrix and M,(p) is at 
least two degrees less than M(p). That R,(p) is ele- 
mentary is almost obvious by inspection. The reader is 
invited to supply a formal proof for himself. After a 
maximum of 6 = 76, cycles, M/(p) is reduced to a constant 
hermetian positive definite matrix 1/,; = C*C, so that 
finally, 


M(p) = P,(p)P(p), 
where 
P(p) = CR3(p)Rs_1(p) --- Ri(p). 


This completes the proof of parts a,) and az). 


(61) 


IRE TRANSACTIONS ON 


INFORMATION THEORY m/ uly 


As regards a;), note that the analyticity of G(p) on 
p = jw implies that @(p) is polynomial, which in turn 
implies that D*(p) = 6(p)d(p) is analytic in a strip 
Re p > —7,7 > 0. This strip is completely determined 
by \(p). Thus H(p) = P(p) D*(p)F(p) is also analyti¢ 
in fe ps 7. 

Under the hypotheses of as), 6(p) = 1, (see the corollary 
to Theorem 1), and 

H”'(p) = F'(p)\“(p)P"() (62) 
is analytic in some strip Re p > —71, 71 > 0. By d) of 
the S.M. lemma, the reality of G(p) permits all associated 
matrices to be chosen real and therefore H(p), V(p) 
and T are real by construction. This terminates the proof 
of Theorem 2. 

Corollary 1: Any factorization of the form G(p) = 
L,(p)L(p) in which L(p) ism X n,m = r(G), is given by 


LE, 


m—-r,r 


L(p) = vol g |x) (62a) 


where V(p) is an arbitrary m X m paraconjugate unitary 
matrix. 

Proof: Clearly, L(p) must be of the form L(p) = 
U(p)H(p), U(p) being an m X r rational paraconjugate | 
unitary matrix. The result now follows by choosing V(p) | 
to be any m X m paraconjugate unitary matrix with © 
U(p) incorporated into its first r columns; 7.e., 


U@) = Vo ot, (62b) | 


V(p) an arbitrary m X m paraconjugate unitary matrix, 
Q.E.D. 
Corollary 2: If G(p) is polynomial H(p) is polynomial. 
Proof: If G(p) is polynomial D*(p) is polynomial. 
Thus, by (45), H(p) is polynomial, Q.E.D. 
Example 1: To see how the above theorem works, 
consider the nontrivial 3 X 3 para-hermetian matrix 


| 1 1 ‘I 
2 eos 0 
Lp Die pe) 
1 p —2 1 
G: = —_ on = 3 a. 63 ' 
(”) pl — p) | pl — p’) | 290 — p’) 637 
| 1 
0 Se : 
2 2p =p.) Ege Ren 


It is easily verified that all principal minors are positive — 
on the real-frequency axis. Hence G(jw) > 0. 

Step 1: The normalized lowest common multiple of all 
denominators is 


gp) = lp) = p — p = pp’ — 1) (64) 


a 
. 
feel eae 
ee OO | P27 2), | 
ErO2 pide ea 
rst, the procedure described in Gantmacher [8]? is 
ed to reduce G(p) to Smith canonic form: 
a) Interchange the first and second columns. This 
amounts to multiplying G on the right by 
[ome 

Ces. 0.0%) (66) 
Oe 
1d the result is 

htaeer eC 

GS 2 =n op =p /2'1. (67) 
p20.) py 


b) Multiply the first row of G, by —p and add to the 


second. This is accomplished by multiplying G, 
on the left with 
Peasy 0) 
ee — pee te Ony (68) 
OeviOy 
d the result is 
Parte pn 10] 
G,=| 2 p+p? —p/2'. (69) 
Fy Oe) ee 


‘c) Interchange the first and second rows of G, and 
multiply the first row by 4. This is accomplished 
by multiplying G, on the left with 


eee a) 
Seat. +0. 04), (70) 
LO) 1| 
d the result is 
: i mn 
1 2 ae an 
G3 = (71) 
=p aa 0 
| p/2 0 —p | 


d) Now multiply the first row by p and —p/2 in 
turn and add to the second and third, respectively. 


2 See pp. 134-139. 


Youla: On the Factorization of Rational Matrices 


179 


This is achieved by multiplying @; on the left with 


Ss 


the result being 


> 


Te 20-04 
= 6 1p 01, 00) (72) 
ieee Ps 0 La 
Rieeaere | 
5 —p/4 
4 = 2 
Oe ; Lp /A (73) 
So ml, 2 
p 4 gP | 


e) Multiply the first column by —(p + p*)/2 and 
p/4 and add, in the same order, to the second and 
third. This is accomplished by multiplying G, on 


the right with 


0; 


and the result is 


1 


K} 


a 
=p Ce ee 
; (74) 
1 0 
0 1 
0 0 
p — Pp 
2 
2 ee (75) 
piers 
4 gP 


f) Interchange the second and third columns; multiply 
the second row by —7/2 and add to the third; 
multiply the second column by 2p” and add to the 
third; multiply the second column by —2 and add 
to the third, and finally multiply the second column 
by —4 and the third by —3. 


The end product is 


G = SuG-O: 


where 


and 


0s 


i 0) 

<a 0 ; (76) 
0 0 p — 3p 

1 0 0 

Onno (77) 
10 —7/2 1 

lL © 0 

ja aes (78) 
0 7-4. l= 


180 IRE TRANSACTIONS ON INFORMATION THEORY July, 


Letting 


C(p) = Ss onon 


and 


EF —=10/0;05— 10 eae 
0 —4 1—p' 


G = CDF where D = g'G, is the 8.M. canonic form: 


Dip) = diag | E a ad 
RM NG CMe oe aie anh 


Step 2: Clearly 


Dias wa 
Nope ie a ! z 
o p + 1 ip + il ) Dp + il oi 
6(p) = diag E 1 | 
Pp fo} p ») ) bs 
D*(p) = Xp) (p) 
V3] 
De 1 A os pe 
Slo@+D pti’ pti J 
p es 
a as 1 1 2 
ENING ae as p+ ) 
Bip) = Mp)F(p), 
and [see (14)], 
Nip) =A" —p)B (@): 
Step 3: By direct matrix multiplication, 
M(p) = 0(—p)N(p)o “(p) 
[2-p p 0 
=| ~ WE ay Ay = DAS 
eae —4p — 2V/3 l-—p 


It is easily verified that M(p) is elementary, para-her- 


metian and positive. 


(79) 


(80) 


(81) 


(82) 


(83) 


(84) 


(85) 


(86) 


(87) 


(88) 


Step 4: The remaining task is to factor M(p) into 
P'(—p)P(p), P(p) being polynomial. Observe that all 


diagonal elements are of second degree, 7.e., 26, 


Thus M(p) = M.(p) and [see (54)], 


=a 0 | 
a6 = tN, 


= 2. 


(89) 


‘elements of 17/,(p) in nondecreasing order of degree from) 


Meh a ‘| (91) 
[eee } 


and ft). = —1 (refer to Fig. 1 for the meaning of the 
symbols). 

The result of adding the first row and column in 
to the second row and column, respectively, is 


Qi M.(p)Q, = M;(p) = M,(p) 


2-—p 2 0 
=| 2 16 4p —2V3|, (99) 
0 4p = 25/5 
where 
Grae oy 
Q, = 17 0). (93) 
OgsOa aia 


The (2, 2) diagonal element has been reduced in degree 
and the first cycle of the algorithm is over. 


upper left-hand corner to lower right: | 
Q5M(p)Q> = M5(p) 


i 16 2 
= ) 
Ap eNO 0 


equidegree: 


Q(—p)M5(p)2(p) = M,(p) 


~16p" —2p —4p? + 2V3p! | 
= 2p 2-—p 0 : (96) 
[—4p°-2V3p 0 ea 
Op) = diag (pyle (97), 
The coefficient matrix of p” is 
—16 O —4 
T=) (Ole Oe (98) 


I Youla: On the Factorization of Rational Matrices 


lence 


I, = (99) 


—16  b f= 
L © —] 


ultiply the first row and column of M,(p) by —} and 
d to the third row and column, respectively. Thus 


M o(p)Q3 = M-(p) 


—16p" —29 2V3p| 
= 2p ie Pi atae 02s | ne OD) 
—2V3p p/2 1 
eOarest 
On=a,0 1-0 (101) 
Om0 EO 


1e second cycle is brought to an end by performing 
e operation inverse to (96): 


H(—p) i (p) 93(p) = Me) 


16 2 S03 
= 2 2-p —p/2 (102) 
[-2V3 _p/2 1 
a one more cycle is necessary. Interchange the 
t two rows and columns of Mx(p): 
1g 2173 
MQ. = Mop) =|-2V3 1 p/2 |, (108) 
| ean? —p/2 2—p! 
1 tOn0) 
On = 02 OP Lis (103a) 
(OU. wise CE | 


yw make all diagonal elements of IM,.(p) equidegree: 


| —p)M(p)93(p) = M,0(p) 


—16p> 2p’V3 —2p | 
=|2°V3  —p’ —p’/2|, (104) 
29 =—p/2 2-p 
Q3(p) = diag [p, p, 1]. (105) 
us, 
RSG aVey 20 
Dee W273 1 =1/2)|; (106) 
kOe Soe 
peer ae fe wT en 8 (107) 
l2V3 1 


181 


Multiply the first column of 17,,(p) by — 1/3/4, the 
second column by —2, sum them and add the result to 
the third. Do the same for the rows. Then, 


—16p? 2p’°V3 —2p 


QMo(p)Q@s = Mulp) = |2p°V3 —p? 0 |, (108) 
2p 0 2 
Qs = 1 —-2 (109) 
LO- (Oe ee 
Lastly, 
23 "(—p)M,(p) 95"(p) = M2(p) 
16 29/3 
=| -2V3 1 0}, <t0) 
2 iy 
a constant, real, symmetric positive-definite matrix. 


Using formula (42) of Gantmacher [8],’ it is easy to 
decompose M,.(p) into a product of triangular factors: 
M,.(p) = C’C, where 


V3 
ee lh 
CA anne Nee (111) 
2 
0 <0 val 


Collecting all matrices and carrying out the calculation 
gives M(p) = P’(—p)P(p), where 


P(p) = COQ,(p)Q; 037 (Qe Cap) Os Se @) OF One 2) 
1/2 7/2 7 ave 
“|ptB o-ME an | 8 
1 il 0 
Finally, the desired expression for H(p) is 
P(p) D’(p)F(p) = H(p) = 
f 1 oe 
2p(p + 1) jer Il 
al par V 3/2 
Te IOS Coma tye Seehay et era 
1 i ' 
Pace sepia) 


3 See vol. 1, p. 38. 


182 


and is evidently analytic together with its inverse in 
Re p > 0. Of course, many of the calculations appearing 
in Example 1 can be abbreviated and have been carried 
out in their entirety in order to give the reader a clear 
picture of the mechanism underlying the algorithm. 

The distinguishing feature of Theorem 2 is that it 
yields a factor matrix H(p) that is analytic together with 
its right-inverse in Re p > O. In problems of the Wiener- 
Hopf type this property of H(p) is of crucial importance. 
On the other hand, some network problems, such as the 
synthesis of lumped, passive n ports [9] merely require 
an H(p) analytic in Re p > 0, with no restrictions on the 
analytic character of H~*(p). In this case, it is possible 
to exhibit a decomposition G(p) = H,(p)H(p) in which 
H(p) is upper-triangular and to give explicit formulas 
for the computation of its elements. 

Theorem 3: Let G(p) be a rational n X n paraconjugate 
hermetian matrix of normal rank n which is non-negative 
on the real-frequency axis p = jw. Then there exists a 
rational upper-triangular n X n matrix H(p), such that 


a,) G(p) = H,(p)H(p). 

a:) H(p) is rational and analytic in Re p > O. 

a;) Under the assumption that all entries in H(p) are 
relatively prime, the elements of any row have 
no common zeros in Re p > 0. 

H(p) is uniquely determined up to within a con- 
stant diagonal unitary matrix multiplier on the 
left; z.e., if H,(p) is upper-triangular and also 
satisfies a,), @2) and a3), H,i(p) = TH(p), where 
T = diag [ee e*"|, the 4's bene real 
constants. 

Any factorization of the form G(p) = L,(p)L(p) 
in which L(p) is upper-triangular and analytic in 
Re p > 0 is given by L(p) = V(p)H(p), where 
V(p) is a regular, diagonal rational n X n para- 
conjugate unitary matrix. 

If G(p) is real, H(p) can be chosen real and JT 
real-orthogonal. Furthermore, L(p) real implies 
V (p) real. 


(4) 


as) 


ag) 


Proof: Consider a,) first, and suppose that H(p) and 
H,(p) are two upper-triangular matrices possessing proper- 
ties a,)—a;). Then, by a,), 


H,(p)H(p) = H,.(p)Hi(p). (115) 
-. Hy.(p)H,(p) = Hi(p)H “(p) = V@) (116) 

and 
V(p) V(p) = 1,- (116a) 


Now (116) shows that V (p) must be both lower- and upper- 
triangular and hence diagonal. Thus H,(p) = V(p)H(p) 
where V(p) is a diagonal paraconjugate unitary matrix. 

By hypothesis, H,(p) is regular so that any right-hand 
pole of a diagonal element in V(p) must be a common 
zero of all entries in the corresponding row in H(p). 
But according to a3) this situation is impossible whence 
it follows that all the diagonal elements in V(p) are 


IRE TRANSACTIONS ON 


INFORMATION THEORY July 


regular paraconjugate unitary functions, 7.é., 
“Blaschke” products. Any such product b(p) has the 
representation 


U 
Ao Repo 0e 
‘”) I] p + Dp, gi 
(P= ple aa Ny (117), 
the zeros of b(p) are all restricted to the right-hand 


p plane. 

On the other hand, if any b(p) appearing in V(p) has 
a zero in Re p > O, this zero is common to all elements 
in the corresponding row of H,(p), since the analyticity 
of H(p) in Re p > 0 excludes any possibility of cancella 
tion. This contradicts a3), and therefore V(p) is constant 
and of the form 


V(p) = diag iene as Thad Se el, 
Q:hD: 4 

Assertion ds) is obvious and the proof of a;) is almost 
identical with that for a,) and is omitted. 

It now remains to demonstrate the existence of an 
upper-triangular factorization H(p) with the attributes 
a,), 2) and a3). Actually it is only necessary to construct) 
an upper-triangular matrix H(p) analytic in Re p > OF 
satisfying a,. For suppose such an H(p) is available. 
Define b,(p), (r = 1,2, --- , n), to be that regular Blaschke 
product formed with all the common right-hand zeros) 
(multiplicities included) of the rth row of H(p). Set 


V(p) = diag [bi(p), be(p), --- , b.(p)]. 


Then H(p) = V,(p)H(p) is upper-triangular and meets, 
conditions a,) —as). 
The concise notation 


the eet 
Hee iia! 
is used to denote the minor of the matrix A formed wit h 


the rows numbered 7,, 72, --: 7, and the columns# 


ki, Ke. ouenene: ) ee Let 
G(p) = g (p)G(p), 
where g(p) is the normalized lowest common multiple 
of all denominators appearing in G(p). Then g(p) = W(p) 
(see the factorization theorem) and 
g(p) = et, (p)t(p), (118) 


the polynomial t(p) being devoid of zeros in Re p > 0. 
Hence, 


e= +1, 


C= 00) = 1. care (119) 


is an n X n polynomial, non-negative, paraconjugate 
hermetian matrix of normal rank n. 

According to Theorem 1 of Gantmacher [8],* G(p) 
can be represented as a product of a lower-triangula 


4 See p. 35. 


a1 Youla: On the Factorization of Rational Matrices 


trix S(p) and an upper-triangular matrix A(p); i.e., 
2. = 


eee 5 GCG) aie: 
poe! a G,1(p) ) (r ~~. i 2; J sf), (120) 
Gees an 
8,x(D) = Sie(P) = ) (121) 
a(t Or oe ) 
oe yh 
d 
mf ed ened hae a RE 
o( ) 
Fe ay he le 
(p) = hus(p) “, 
ae 2°: 
Oh 2a rik 
Ge sie ti lores ee 1D 5 Ty (122) 
which (G, = 1) 
¥ ey il 2 G | ‘ 


r(r = 1, 2, --- , n). These latter inequalities are a 
nsequence of the positive character of G(jw) and the 
sumption that its normal rank is n. 

Now all the G’s and G,’s are polynomials in p. By 
‘pothesis, G,(p) = G(p) which in turn implies that 


ee iy eg (12 -1,8) 
Ree ey FN Bieebie ol)” 


Ci ie re Na) (124) 

particular, 
G,(p) = G,,(P), (r =; 2, oe ,n). (125) 
ace GGw) > 0, G,(ja) = 0, @ = 1, 2, --- , n). Thus 


‘ery paraconjugate polynomial G,(p) can be factored: 
: G,(p) cae Yrx(P)Yr(P) 5 (r = Ne 2, i OE (126) 


e polynomials y,(p) being free of zeros in Re p > 0. 
Set (yo = 1) 


h,(p) = aD: 


yp) ee 
d 
8,,(p) = Yelp) (ae eo ity (128) 
rr Yr—1»(p) r} ) 5) PI 


is obvious by (126) that (120) is satisfied, that h,,(p) 
analytic in Re p > O, and that s,,(p) = h,,.(p), 
oie). erom (122) and (121); 


Aloe eee aus) 
clean ee etien 


Yn—(P)Yn(P) 


h..,(p) ra (129) 


S(p)H(p), where S = (Srx)5 Hl =a (hx) and 


183 
and 
mip k, 2c me lee 
mS P Yu—1«(P)Yus(p) ee P : 
n= ie See oe 2 le reg) (130) 


Hence H(p) = (hyx) Is upper-triangular, analytic in 
Re p > O and obeys G(p) = A, (p)H(p). The matrix 


H(p) = t'(p)H(p) 


meets the desired requirements a,) and a,.), Q.E.D. 

Corollary: Let G(p) be ann X n rational paraconjugate 
hermetian matrix of normal rank 7 which is non-negative 
on the p = jw axis. Then there exist rational matrices 
H,(p), A(p), an n X n permutation matrix Q and a 
regular Blaschke product b(p), such that 


(131) 


b,) H,(p) is r X r, nonsingular, upper triangular and 
analytic in Re p > 0. Moreover, the elements in 
any one of its rows have no common zeros in 
ree 0: 

A(p) isr X (n — r) and b(p)A(p) is analytic in 
hep > 0; 

The r X n matrix 


H(p) = b(p)Hi(p)[1, | A(p)] 


is analytic in Re p > 0 and satisfies H,(p)H(p) = 
Q'G(p)Q; 

For the same choice of Q, H,(p) is uniquely de- 
termined up to within a constant diagonal r X r 
unitary matrix left-multiplier V (p); 

G(p) real implies that H,(p) and A(p) are real and 
V is real-orthogonal; 

bs) Ifr = n, Q may be chosen equal to 1, and b(p) = 1. 


(131a) 


bs) 


bs) 


Proof: Since G(p) is paraconjugate hermetian and of 
normal rank r, it possesses at least one nonsingular 
principal minor of order 7. By permuting rows and 
columns, this minor can be shifted to the upper left-hand 
corner. Thus, for the proper choice of permutation 
matrix Q, 


| Gi(p) | Ga(p) Ir 
Q’G(p)Q = , (316) 
| Go.(p) | Ga(p)_in — r 
Hie HU sath 


where G,(p) is of normal rank r. In addition, G3,(p) = G@3(p) 
and G,,(p) = Gi(p) are both non-negative on p = 
By the definition of rank, 


G.(p) = Gi(p) A(p) 


Jo. 


and 


G3(p) = Ge.(p)A(p) = A, (p)Gi(p) A(p), 


A(p) being a rational r X n — r matrix. Hence, 


Q'G(p)Q = M,(p)Gi(p)M(p), (131¢) 


184 IRE 


where 


M(p) = [1, | A@)], Ap) = Gr"(p)G.(p). (181d) 


Let g(p) be the lowest common multiple of all de- 
nominators appearing in A(p). Then 


—1__ 1, @)G(p) Mp), (1B Le) 


I (p)g(p) 
the matrix M(p) = g(p)M(p) being polynomial. By 
Theorem 3, there exists a matrix H,(p) with the property 
b,) satisfying H,,.(p)Hi(p) = Gi(p). Again, g,(p)g(p) is 
paraconjugate hermetian and non-negative on p = jw 
and so admits the Hurwitz factorization 


ar h,(p)h(p), 


the polynomial h(p) being free of zeros in Re p > 0. 
Evidently g(p) = b(p)h(p) where b(p) is a regular all- 
pass factor. Consequently, 


J4(P)g(P) 


Q’G(p)Q = H,,(p)H(p), where 


_ gp) 
h(p) 


b(p)H,(p)[1, | A(p)], 


analytic in Re p > O by actual 


Hp) A (p)l1, 


] 


I 


(131) 


and b(p)A(p) are 
construction. 
Now suppose that 


Q’G(p)Q = H,(p)H(p) = H,(p)H(p) 

in which rn-r 

H(p) = b(p)H,(p){1, | A(p)] 
and 

Te aie 

A(p) = b(p)Ai(p)U1, | A(p)]. 
Then, 
[1, | ee 1, | A(p)] 

1, | A(p)], M..(p) Ai (p) 1, | Ap). 
oe Eivela = aa 


. Heian ad Hie 


is both upper- and lower-triangular, and, using what 
should by now be a familiar argument, it is concluded that 
H,(p) = V(p)H,(p), where V(p) is an r X r, constant, 
diagonal unitary matrix; b;) is immediate and as for bz) 
note that r = n implies G,(p) = G(p) so that Q = 1, and 
b(p) = 1 are admissible, Q.E.D. 

Example 2: As an illustration, consider once more the 
3 X 3 para-hermetian matrix G(p) in (63). From (64), 
g(p) = p — p = t(—p)t(p) where t(p) = p(1 + p), 
whence e€ = +1 and 


TRANSACTIONS ON INFORMATION THEORY 


=p 1p 0 
Go) = GQ) =|.p 2p = 772 
i p/2- —p’ | 


By direct computation, 
Gi(p) = 
G.(p) = p’ — p = yo(—p)ys(P); 

yo(p) = pl + p); 


—p° + 3p* = y3(—p)ys(p); 


—p = y(—p)yi(p); —-_ylp) = PD; 


G5(p) a 
y3(p) = ace o »); 


=p; 
ail 

a() = 0: 

aie ae 

al} 2) = p'/2. 


Using the formulas (127)—(137), 


hulp) = wee = —p; 
- a Y2( —P) Beets a 
Nae(P) yp) ee 
Vie 
hs3(p) a ys=P) = | : } 
Yo(P) Leal 
aa) 
h ( ) pee ae = =i 
Rai: 1-y,(p) 7 ; 
mith 
nO J = 0; 
1B Cp) 
Pha 
PAC a(} 5 2 p } 
23\P) ~ y(pyye(p) 20 + p) 
Thus G(p) = H+(—p)H7r(p), where 
Hyp) = t '@H@ 
es : ‘ 
jee: p(p + 1) 
2 0 (Wee 3 nae 
pip De i 2@eeb ess 
No 4 
2 
e # e (1 + p)’ 


61 


is obvious that H7>'(p) is not analytic in Re p > 0. 
The reader will perhaps find it interesting to compare 


e matrix 


(p) = Hr(p)H'(p) 


— 


0 0 =i 
v3 
Se ‘bee 
\a 2p += 1) (Dear : (145) 
pave v3 
| ee eds F 
Lp 2 
x1 mlp+%3)) 


regular and para-unitary, in agreement with a,) of 
heorem 2. 
The matrix 


eae 1 ‘ 
| pra A p(p + 1) 
0 DQ = 1 es, 
fT -(p) = | pip +i) | %pt+ iy (146) 
| | i‘ a ue 
0 0 a 
ee (Chea ye) eel 


also a regular upper-triangular 
'(—p)H(p), but unlike (144) the 
-e relatively prime with respect 
learly, 


solution of G(p) = 
elements in any row 
to right-hand zeros. 


H(p) = V(p)H7(p) 


here 


le WEA ei 


d 2a 
V(p) = diag | 1, 1, —=—— 
P) lag AE 

Be aioe 


IV. APPLICATIONS 


Problem 1: Solve the integral equation (1) by the 
fiener-Hopf technique subject to the following  re- 
rictions: 


wr) G(p) is rational and has the properties listed 
| under (3); 

W2) G(p) and G'(p) are both analytic in a strip 
—n < Rep<1,1> 0; 

Ws) 

| ros) 


Ep) = | e(the”' di 


(147) 


has a strip of convergence intersecting the interval 
—n < Rep < 7». 


Youla: On the Factorization of Rational Matrices 


185 
Solution: Let 


7(p) with the H(p) of (114) and to assure himself that y(t) = / Kt — 7)W@) dr — e@), 


(co <a icon) (148) 


Then y(t) = 0, ¢ > O and its bilateral Laplace transform 


¥@) = f ye" at (149) 
is analytic in some left half plane. 
Transformation of both sides of (148) yields 
Y(p) = G(p)F(p) — E(p) (150) 


in some common strip; F(p) is the transform of W(r); 7.e., 


F(p) = { a W(ne” dr. (151) 


The physical realizability of the filters 
W,(7), W.(r), eee) W,.(r) 


implies that F(p) is also analytic in some right half plane. 

According to Theorem 2, G(p) = H,(p)H(p) where 
H(p) is real, rational and analytic, together with its 
inverse in —y < Re p. From (150), 


H,'(p)¥(p) = H(p)F(p) — Hy '(p)E(p). 


In general, H,'(p)E(p) is not analytic in either half 
plane, and one must resort to the usual artifice of de- 
composing it into the sum 


Hy (p)E(p) = {H,'@E)}. + {Hy (pE@)}- 


in which the first factor on the right is analytic in a 
half plane Re p > —y, » > O, and the second in Re p < uy. 
Inserting (153) into (152) and rearranging gives 


H,(p)X¥(p) + {HY @E)}- 
= H(p)F(p) — {H,'(p)E(p)}.- 


The right-hand side of (154) is analytic in some strip 
Rep > —, m > O, and the left-hand side in Re p < +y,. 
Thus the right-hand side is an entire matrix function of p. 
The simplest solution is obtained by setting this entire 
function equal to the zero matrix. Thus 


F(p) = H'(p) {Hy '(p)E(p)} +, 


and its strip of convergence is some right half plane. 
The only aim of the above derivation is to indicate how 
the factorization idea enters into the Wiener-Hopf tech- 
nique; most of the details concerning rigor have been 
purposely omitted. Suffice it to say that these details are 
not too difficult to fill in for Gs meeting conditions 
w,)—w;). Formula (155) highlights in a most emphatic 
manner the importance of the requirement that H™‘(p) 
as well as H(p) be analytic in Re p > —y. The filters 
defined by (155) are not necessarily stable. 


(152) 


(153) 


(154) 


(155) 


186 


The case in which G(p) is of normal rank less than 1 
is singular and not important as far as the physical 
applications are concerned because it represents a situa- 
tion in which the noise can be completely obliterated by 
an appropriate selection and interconnection of differ- 
entiators. For, if r(@) < n there exists a nontrivial poly- 
nomial n vector F(p) Gine 16 - + ty) 5. SUCHthat 
G(p)F(p) = 0, and the weighting functions 


ayo d } z. 
W,(7) = nl (k = 1 2, aa at); (156) 


do the trick. 

Another interesting question is that of generalizing 
the concept of ‘‘flat’’ noise to the multivariable case. 
Fortunately, this turns out to be unexpectedly simple: 
The k-dimensional noise process n(t) = (m1, M2, +++ , Mx)’ 
is said to be flat or ‘‘white’”’ if its associated spectral 
density matrix G(p) is an elementary polynomial matrix. 
Thus its entries are polynomial in p and its determinant 
is a nonzero constant independent of p. One justification 
for this definition is the following. Suppose it is desired 
to design a k-channel ‘‘matched”’ filter [2]. The appropri- 
ate integral equation is 

{| Ki aw (ads Ses ae SO 
0 
where s(t) = (8,, So, -++ , 8)’ is the column vector of the 
known channel pulse shapes, s,(¢), so(t), --- , s,(t), and 
t) is the detection instant. Transforming both sides of 
(157) over the doubly infinite range (— © <t < ~) gives 


F(p) = €°G"'(p)S(—p); 


a (158) 

Si) = {est dt. 
As a rule, the F(p) described in (158) cannot be made 
physically realizable no matter how large a delay f is 
incorporated into the design. There is one notable ex- 
ception, however, and this occurs when G(p) is an ele- 
mentary polynomial matrix and s(é) is of finite epoch; 7.e., 
when s(t) = O fort < —T,|T | < &. To see this, let 
G*(p) = (Inn), the l’s being polynomials in p. The opera- 
tional inverse of (158) yields the weighting functions 


k 
W,(r) = 2D tle — ty), @ He 2. 27-9) ky). (159) 
Titig? 2 T50W (7) S= OF PAO GS) 1 ee eh and. 
realizability has been achieved at the expense of system 
delay. Eq. (158) generalizes Dwork’s well-known single- 
channel result [10]. 

Problem 2: Given an n X n rational matrix A(p) of 


normal rank r, exhibit a factorization of the form 
A(p) = V(p)H(p), where 


1) V(@) is an n X r paraconjugate unitary rational 
matrix, and 

2) H(p) isr X n, rational and analytic together with 
its right inverse in Re p > 0. 


IRE TRANSACTIONS ON 


INFORMATION THEORY July) 


Solution: The paraconjugate hermetian matrix G(p) = 
A,(p)A(p) is n X n, of normal rank 7 and positive o 
p = jw. By Theorem 2, Corollary 1, there exists an r & 7 
rational matrix H(p) analytic together with its right im 
verse H~'(p) in Re p > 0, such that G(p) = H,(p)H(p 
and A(p) = V(p)H(p), V(p) being an n X r paracon 
jugate unitary matrix, Q.E.D. Note that V(p) is analytie¢ 
in Re p > 0 if an only if A(p) is analytic in Re p > O@ 
Moreover, H(p) and V(p) are unique up to within a 
constant r X r unitary matrix multiplier on the left and 
right, respectively. Lastly, 1, — A,(jw)A (jw) = O implies 
that 1, — H,,(jw)H(jw) = 0. Thus, it is possible to factor 
every rational matrix into the product of a “matrix all 
pass’’ V(p) and a “‘minimum-phase”’ matrix H(p) without 
destroying either its passive or rational character. 

The next problem bears on the structure of lumped, 
passive, lossless 7 ports [7]. 

Problem 3: Investigate the structure of rational n X 
paraconjugate unitary matrices V (p). 

Solution: Suppose that V,(p)V(p) = 1,, and let 


e(p) @(p) aD) 
ACER OR ON 


be its S.M. canonic form. Then the e’s and y’s are 
monic polynomials such that e; | @ | --- | e.(p) andy 
Yr | Wn. | ++: | Wi(p). The notation f | g means that f¥ 
divides g. In addition, e,(p) and y,(p) are relatively | 
prime, (r = 1, 2, --- , n). By definition, there exist two 
elementary n X n polynomial matrices A(p) and B(p) 
such that 


D(p) = diag | 


V(p) = A(p) D(p)B(p). 
Since! VG) = Ve), 


B,(p) D,(p)A,(p) = B-'(p) D"(p)A~“(p). 


Now, except for possible plus and minus signs, D,(p) i | 
already in canonic form, while the 8.M. canonic form 
corresponding to D~"(p) is 


. Vrlp) Vn—1(D) 
om Ee Sep oe 


(162) 


col 


») 


and is achieved by merely permuting the rows and } 
columns in D~'(p). By the uniqueness part of the S.M. 
lemma, 


é(B) = 6 Varsie(P)) C= 


the e's being either +1. Hence, the S.M. canonic form 
of V(p) may be written as 


-,n), (163) 


<= Bie ae) tat) | 
Dip) = sing [Ye Yel. Yall], gg 
() 81.) val) vol “6a 
where 
2 = diag [a, @, >> , &| (164a) | 
and 
¥,.(p) is prime to y,-,.1(p), (7 = 1,2, +--+ ,n). (164) 


a 


low 


nce V = ADB, 
| Vip) | = constant X [J YrslP) (165) 
r=1 eG) 

Let ¥,(p) possess the distinct zeros p,, po, -** , Dy, 
ith respective multiplicities 714, 11.2, Pte Nene 
nce Wn | Wri | -*: | vs, 
eK Ot a Pic (P 2) = (P= py) 
vo(p) = (p — pi)'"*(p — pr)" + (p = D)", (166) 
eee (Daa 2a) 1 (P= Po) at eps) 
here 
) Tees oes: wonwee Bi 

Ty. = Poo = we = Pr2s (167) 

Tip 2 lay = ee Be Tay 


a factors appearing in tableau (166) with nonzero 
ponents are called the elementary divisors of V(p), 


miesheuntegers 7,7, b= 1,2) 96) nid = 1y2e ees y) 
-e its indices. The total indices are the » integers 
Pee (ee 1, ete ye (168) 
4=1 


Now suppose that | V(p) | = a constant independent 
* p. From (165) 


(p) v(p) >> ¥,(p) 


= constant X ¥1.(p) Po.(p) +++ Wns(p), 


| (169) 
>, using (166) and (168), 


Il @-—»)* = + I @ +5)". (170) 


his implies that every zero p, is accompanied by the 


ro — p,, and their associated total indices are equal. Since 
) is analytic on p = jw, Dy ~ 


— Pi (k cd eine PD 
hus a paraconjugate unitary matrix has constant de- 
.rminant if and only if any pole p, has the same total 
dex as the pole —j,. It is an immediate corollary that 
ay regular V(p) with constant determinant must be a 
nstant unitary matrix. The restriction that y,,(p) be 
ime to W,-,1:(p) imposes some further structure limi- 
tions. Thus y,(p) = 1 irrespective of the choice of 
(p). For example, if n = 2, 


vi(p) = ¥1.(p), 
W2(p) ae ily 


(171) 
(172) 
D(p) = diag [vi '(p), Wi(p)]. (173) 


n = 3 there are several possibilities which 
plained by dividing the indices (71, M2, 


are best 
) (ee) 


Youla: On the Factorization of Rational Matrices 


187 
into two classes (Trias Ti2, eee Pi) and (i 12, as ioe 
writing 

y/2 = 
(po) = IL @ — p)™@ +B) (174) 


and considering what the situation must be like with 
respect to a single zero, say, p,. 

Recall that Y.,(p) must be prime to y¥.(p). Thus, if 
Y.(p) contains the factor (p — p,), it cannot contain the 
factor (p + p,). Suppose, for definiteness, that 7, > ri. 
Then 


v(p) = (p — pr) “(p + Br), 
vo(p) = (p — pi)”, (175) 
W3(p) = 1, 


where: yy fey = "Tiny Tan Se LE ie slater 
requirement of equal total indices is impossible to meet. 
Therefore, any pole p, of multiplicity r,5 > 0 of a para- 
conjugate unitary matrix with constant determinant 
must be accompanied by the pole —j, with multiplicity 
Rey where a orot eran. 

The canonic form of a paraconjugate unitary matrix 
is completely delineated in (164)—(164b). Conversely, 
given a 


a WVne(P) Vn—1-(p) bl) | 
Dip) =d tl ae? 
) ae | O Wp) > he) Nee 
in which the w’s are monic polynomials satisfying 
a) Vn | Wn—-1 | Wn—2 | bie: V1, and 


b) Wr. (p) is prime to Viera), (r a i 2, ik foe n), 


does there exist a paraconjugate unitary matrix whose 
canonic form (up to within plus and minus signs) is 
D(p)? A complete and simple answer is available for 
regular matrices. 

Theorem 4: The matrix D(p) is the canonic form (up to 
within plus and minus signs) of a regular paraconjugate 
unitary matrix V(p) if and only if y, | wri |--: | 
and y,(p) is a strict Hurwitz polynomial. 

Proof: The ‘‘if” part is obvious. As regards the ‘‘only if” 
part consider the paraconjugate unitary matrix 


se ve eee Yate) | 
Does [ no)? vw)? d@f 8 
Since ¥,(p) is strict Hurwitz and y, | Yn-1 | --- | wa, all 


y’s are strict Hurwitz and y,,(p) is automatically prime 


to ¥,(p), (7 = 1, 2, --- , n). Now the canonic form of 
V(p) is 
ee A ero) 5) | 
dia | ) eae ae) ) 
mS Lap)? 82(p) 6,(p) 


the polynomials 6;, 62, --- , 0.(p), possessing the proper- 
ties a) and 6) listed under (176). By either direct argument 
or an appeal to Theorem 5.29 of McMillan [5], it is 

5 The author wishes to take this opportunity to point out that 
the part of the footnote appearing on p. 194 of Youla [7] which 


asserts that every para-unitary matrix with constant determinant 
is a constant matrix is incorrect. 


18S 


easily established that ¥,(p) = 6,(p), (7 = 1, 2, «++ , 7). 
Thus (177) is a regular paraconjugate unitary matrix 
with the desired canonic form (176), Q.E.D. 

It now follows that the most general regular para- 
conjugate unitary matrix V(p) with the canonic form 
(176) is given by 


Vip) = A(p) V(p)B(p), 


where A(p) and B(p) are two elementary polynomial 
matrices. A method for choosing A(p) and B(p) is the 
subject of Theorem 5. 

Theorem 5: An elementary polynomial matrix A(p) is 
the left-hand factor of a V(p) defined by (178) if and 
only if the matrix G(p) = V,(p)A,(p)A (p)V(p) is poly- 
nomial. 

Proof: From (178) 


G(p) = V,(p)A, A) V(p) = B,'PB(p). 


Hence G(p) is an elementary polynomial matrix. Con- 
versely, let G(p) be polynomial. Since | G(p) | = 
A,A|-|V,V | =|A,A | = constant, G(p) is actually 
elementary. Clearly, G(p) is paraconjugate hermetian and 
positive on p = jw and it follows, by Theorem 2, that 
there exists an elementary polynomial matrix B(p), such 
that G(p) = B,'(p)B'(p); t.e., the matrix V(p) = 
A(p)V (p)B(p) is paraconjugate unitary, Q.E.D. 

Corollary: Let D(p) have the properties [176, a) and b)] 
and let A(p) be an elementary polynomial matrix, such 
that D, (p)A,.(p) A (p) D(p) is polynomial. Then there exists 
an elementary polynomial matrix B(p), such that 
V(p) = A(p)D(p)B(p) is paraconjugate unitary. The 
S.M. canonie form of V(p) is D(p). 

Theorem 6 |6|: Let 


where the y’s are monic, ¥, | ¥,-1 | «+: | ¥i(p), and y,(p) 
is strict Hurwitz. Let A(p) be an arbitrary elementary 
polynomial matrix. There exist two elementary poly- 
nomial matrices P(p) and F(p), such that 


Vip) = A(py©f(p)F (pv "(p)P(p) 


is a regular paraconjugate unitary matrix with the 8.M. 
canonic form (176). 

Proof: Consider the positive, polynomial, paraconju- 
gate hermetian matrix G(p) = W(p)A,(p)A(p)¥, (p). 
The key observation is that G(p) and W,(p) W(p) = G(p) 
possess the same 8.M. canonic form. To prove this, it is 
necessary to show that the greatest normalized common 
divisors of all l-row, 2-row, --- , n-row minors of G(p) 
and V(p) are identical. Obviously, the greatest common 
divisor of all r-row minors of G(p) is 


d,(p) = 6,.(p) 6,.(p) , 


(178) 


(179) 


(180) 
where 
6,(p) — Wn—r+1(P) Wn—r+2(P) nad Vn(p) , 


eh 2 esa) (181) 


IRE TRANSACTIONS ON 


INFORMATION THEORY July 


Denote the greatest common divisor of all r-row minors 
of G(p) by d,(p). Then d,(p) | d,(p), (rf = 1, 2, --- , n)mm 
If d,(p) # d,(p), 


dp) = n,(p) d,(p), 0), 


n.(p) being a polynomial of nonzero degree. Consider the © 
r-row minor 


y= 1,2,--° (182) 


1, 12 


“| 
n—-r+2 


i eral 
This minor is formed with the rows numbered 2,, 72, «++ , 2, 


and the last r columns. Let the corresponding minors in 
A, (p)A(p) be denoted by 


n—-rtlon-—-r+2 n—-l1on 


boa a (183) 5 


1 — Lae 


From the form of G(p), 


al 1, Vg eo Vr=4 ‘ 


ln =r bl a7 a 
= Wii. See Wi, Wn—r+is ae Wns 
( ; : 
‘K| Vy 2 tr—1 : (184) 


in—-rtin-r+2.:-- 


The right-hand side of (184) must be 
n,(p)d,(p) or, by (180), 7,(p) must divide 


divisible by | 


ViVi eve 
9,(p) | 
ae) Aan ‘ ee: "| (185) | 
n—-r+l1l*n-—-r+2 nln 
Since ¥, | ra | --: | ti, 6-(p) | Wii, «+> W:,, and this 


polynomial quotient is strict Hurwitz. Noting that | 
| G(p) | = constant X d,(p), it is clear that any zero of 
n-(p) must be a zero of ¥,(p)y1.(p). If n,(p) does possess” 
a right-hand zero po, (p — po) must be a factor of all 
K’s appearing in (185) for everyone of the "C, choices of 
11, I2, ++: , 1,. In a similar manner, by arguing with 
minors formed with the last r rows of G(p), it can be. 
concluded that if p, is a left-hand zero of 7,(p), the linear | 
factor (p — po) must divide the "C, minors 


pate = ar 
|” TR ae ; 


on ip A tere gah aa eT 
Consequently if ,(p) possesses either a left- or right- 
hand zero po, at least one row or column of the rth com-_ 
pound [8] of A,(p)A(p) is divisible by the linear factor 
(p — po). But this is impossible since any compound 
of an elementary polynomial matrix is an elementary 
polynomial matrix.* Thus 7,(p) = 1, (r = 1, 2, --- , n), 
Q.E.D. 


6 Tf A is an arbitrary n Xn matrix and A, its rth compound, 
1A,| = |Alrer-a. 
| oe 


61 


By Theorem 2, there exist two elementary polynomial 
atrices P™(p) and F~"(p), such that 
) 


G(p) = H,(p)H(p), 
here 
) H(p) = P-'(p)W(p)F"(p). 


hus, the matrix 


V(p) = A(py©,.(p) F (pe "(p)P(p) 


-paraconjugate unitary and regular and has the 8.M. 
unonic form D(p), Q.E.D. 

The fine structure of rational, regular, para-unitary 
jatrices stands completely revealed in the beautiful 
rmula (178) and is an excellent example of the power 
Theorem 2. 

There still remain many difficult problems of classifi- 
ation which the author hopes to discuss in the near 
uture. Some of these problems have been partially re- 
slved in Oono and Yasuura, [6] which is, to date, un- 
oubtedly the outstanding paper on the subject. 


V. CONCLUSIONS 


The purpose of this paper has been to present a readable 
nd systematic account of the more recent developments 
oncerning the difficult but important problem of rational 
actorization of rational matrices, and to illustrate the 
heory by nontrivial examples. The main result is em- 
-odied in Theorem 2, and it would be extremely useful 
. have available a computer routine for this very valuable 
nd fundamental algorithm. The memory requirements 
re probably too severe for present-day digital computers, 
ut the possibility should be explored. 

Since nonrational matrices can be approximated as 
losely as desired by rational matrices, Theorem 2 pro- 
ides, in a sense, an effective solution of the Hilbert 


Youla: On the Factorization of Rational Matrices 


189 


problem for the semi-infinite line and the class of positive 
paraconjugate hermetian matrices [11]. 


ACKNOWLEDGMENT 


The unique work of Oono and Yasuura for which the 
author has already expressed his great. admiration is not 
only a significant contribution to the literature of net- 
work synthesis but also to the algebra of rational matrices, 
and deserves much more attention than has been ac- 
corded to it. If the present paper succeeds in improving 
this situation and stimulating research in this direction, 
one of its main objectives will have been realized. 


REFERENCES 


[1] R. C. Amara, “The Linear Least Squares Synthesis of Con- 

tinuous and Sampled Data Multivariable Systems,’ Stanford 

Electronics Labs., Stanford, Calif., Tech. Rept. No. 40; July 28, 

1958. 

[2] D. C. Youla, “The Theory and Design of Multiple-Channel 

Matched Filters,’ Atlantic Res. Corp., Alexandria, Va.; June 

25, 1959. 

3] N. Wiener and L. Masani, ‘‘The prediction theory of multi- 
variate stochastic processes,”’ pts. 1 and 2, Acta Math., vol. 98, 
June, 1958. 

4] H. Cramer, “On the theory of stationary processes,’ Ann. 
Math., vol. 41, ser. 2; 1940. 

[5] B. McMillan, “Introduction to formal realizability theory,” 

Bell Telephone System, Monograph 1994, May, 1952. 

{6] Y. Oono and K. Yasuura, “Synthesis of finite passive 2n- 

terminal networks with prescribed scattering matrices,’ Mem. 

Kyushu Univ. (Engineering), Japan, vol. 14, No. 2, pp. 125-177; 

1954. 

[7] D. C. Youla, ‘Physical realizability criteria,’ 1960 IRE 

INTERNATIONAL CONVENTION RECORD, pt. 2, pp. 181-199. 

F. R. Gantmacher, ‘““The Theory of Matrices,’’ Chelsea Pub- 

lishing Co., New York, N. Y., vol. 1; 1959. 

[9] V. Belevitch, ‘‘Synthése des reseaux éléctrique passifs 4 n 
paires de bornes de matrices de repartition prédeterminée,”’ 
Ann. Telecommun., vol. 6, pp. 302-3812; November, 1951. 

[10] B. M. Dwork, ‘“‘Detection of a pulse superimposed on fluctua- 

tion noise,’’ Proc. IRE, vol. 38, pp. 771-774; July, 1950. 

[11] I. C. Gohberg and M. G. Krein, “Systems of integral equations 

on a half line with kernels depending on the difference of 

arguments,’ Am. Math. Soc., Trans., vol. 14, ser. 2, pp. 217-287; 

1960. 


190 


Corres pondence 


Noise In An Amplitude Selective 
Detector* 


The second detector of a superheterodyne 
receiver is commonly referred to as an 
envelope detector and can be represented 
as shown in Fig. 1. The graph identifying 
the nonlinear circuit is called the voltage 
transfer characteristic in which output and 
input voltages are plotted vertically and 
horizontally, respectively. 


LOW PASS 
FILTER 


Fic. 1—Block diagram of linear detector. 


A great deal of information has been 
published on the behavior of various forms 
of envelope detector circuits for various 
types of input signals and noise. Probably 
the best known is the work of Rice [1] in 
which he calculated probability functions 
of the envelope of a sine wave added to 
to Gaussian noise. Although the circuit of 
Fig. 1 performs essentially the function of 
envelope detection, 2(¢) is not strictly pro- 
portional to the envelope. This linear 
detector has also been analyzed by a 
number of writers. Appropriate mathe- 
matical techniques and original literature 
references are given by Davenport and 
Root, [2] among others. 

In this paper, a detector circuit is con- 
sidered having a band-pass voltage transfer 
characteristic as shown in Fig. 2. If the 
circuit of Fig. 2 were used in a radio 
receiver (for demodulation of pulses, for 
example), the effect of the band-pass non- 
linear element would be to make the 


detector amplitude selective. That is, the 
output voltage z2(f) would be large only if 
the RF pulses at the input possessed the 
appropriate amplitude. 


LOW PASS 
FILTER 


Fic. 2—Amplitude selective detector. 


A detector of this type might be used in 
desensitizing a receiver to impulsive inter- 
ference caused by electrical discharges or 
by high-power pulse transmitters located 
in proximity to the receiver. This technique 
has been used in radar receivers to a minor 
extent [3]. Its usefulness is obviously 
limited to cases in which the interfering 
pulses are very large relative to information- 


* Received by the PGIT, June 21, 1960; revised 
manuscript received, November 28, 1960. 


IRE TRANSACTIONS ON INFORMATION THEORY 


bearing signals. Basically, however, an 
amplitude-sensitive detector is useful 
whenever information is contained in the 
received strength of pulse signals. 


Norse IN AN AMPLITUDE-SENSITIVE 
DETECTOR 


Two parameters of the random noise at 
the output 2(¢) are of interest here: the 
mean value, m., and the variance, o2?. In 
general these quantities can both be 
obtained from the input autocorrelation 


1 
UOTE Wa Or eee 


R,(7) can be evaluated exactly in terms 
of a closed expression when the followin 
assumptions are made: 

1) x(t) originates from a _ stationary 
Gaussian random process; 

2) f(x) is Gaussian in shape, 


f(a) = Veh a) Mois 


The following paragraphs are devoted to 
the special case specified by assumptions 
1) and 2). 

Since «(t) is from a Gaussian process, 
x(a, X2; 7) 1s of the form 


[amas + (x,—A—A)’ — Hee Sa 
fire Beel1 — px(7)] 


function &.(7). Specifically, 
o = RO) — m. 
R,(7) 


= ik h(wWhQ)R,(7 + v0 — u) du dv. 


The linear low-pass filters will be assumed 
to have a de gain of unity which means that, 


— 


m. = m, = lim R,(7) 


R,(7) = / / YiYoPu(Yr, Y2; 7) Cys dye 


fs ile ie (21) f(a) P; Oe eo OO 


The above equations give the mathe- 
matical machinery necessary to compute 
mz and oz Notice that it is sufficient to 
know: 1) p:(x1, 22; 7), the second-order 
probability-density (p.d.) function, cor- 
responding to the input 2, 2) f(x), the 
voltage transfer characteristic of the 
nonlinear device, and 3) A(t), the impulse 
response of the linear filter. 


where A + A = mz. 

The exponent of the integrand of R,(7) 
takes on a quadratic form. By means of 
the linear transformation, 


%=uy,+tA+aA-—k, 
t, =v, AS Aas, 


the exponential integrand may be written, 


a Bos a v> — 2pvv2 pte K 
*P Oe? (ies) Se 


Since K is independent of the integration, 
—K/2 
—. Ga 
2 12 
Call sepa mil 


Like K, o and p are products of the trans- 
formation which depend on oz, pz, and a. 
Specifically, 


R,(7) = (1 — pp} ?e 


ol = 93)” 


a (1 = 5 paler = 


ge) 4 (%) a — atte) | 


I = 2(A/a)” 
1+ (2) 0+ oe) | 
Finally, 
= (A/a)? } 
R,(z) = on | 1 + (¢,/4)*[1 + p.(7)] 


{{1 + (e,/a)*? — [(c,/a)*p.(1)}? 07? 


961 
he above equation holds for a Gaussian- 


aped detector characteristic and Gaussian 
oise. 


Pup ae 


x 


p(T) = 


’ 


here F,(r) is the autocorrelation function 
the input noise. 


SrenAu-To-NoisE Ratio 


It is informative to determine the mean 
ind rms values of the output z at the 
railing edge of rectangular input signal 
ulses mixed additively with Gaussian 
(es Prior to the trailing edge of a rec- 
angular pulse for a time equal to its 
uration, the input z(t) is still stationary 
nd Gaussian, satisfying the previous 
ssumption. The value of m, at an instant 
sorresponding to the trailing edge of the 
put pulse is defined m,. o, will be similarly 
efined. 

To evaluate the SNR defined later, it is 
necessary to compute m, and o, when noise 
bnly is present. The mean and rms values 
nder these conditions will be referred to 
S my and oy, respectively. 

If h(t) is assumed to be rectangular and 
bf length 7, [R.(7)],-0 reduces to the form 


Correspondence 


When noise only is present, m, = 
A + A = 0 and A = —A. The value of 
oy can be written approximately, 


Mn 


2 ae = 
lelanna =O \/ WT 


7) 1/2 
[ete +o] - 


The mean values m; and my can be eval- 
uated from R,(r) as rt — © as given in: 


19 


Fig. 3 illustrates the significance of the 
output SNR. 


®o is the distance from V to m, expressed 
in o, units, or the distance between V and 
my expressed in oy units; 


WEG = V 
V is defined by = 


os on 


my — V 


Hence, ®o measures the number of standard 
deviations that a threshold, set for approxi- 


20s cS mately equal signal and false alarm prob- 
Ms 5) y 5 
Il +a abilities, is separated from the mean of 
either signal-plus-noise or noise only dis- 
5 ther p 
2 a ie Ole tributions. 
Hise iL ae ae e Substituting ms, my, og, and oy into the 
defining equation, we obtain 
R 1 = oF ee 
y = : . 
V 2xWT dass ; R? 1/2 
[log fay}? + @ G84") log fla) + 2-~— a5 
(ie tas) 


RO) = of + mi = 2/7 [ (1 = +/T)R) ar, 


—(A/a)’ 


A closed form for o,? follows for the case 
iyhere A = 0 and where p:(r) = e 27”*. 
This corresponds to the condition wherein 
the peak of the signal pulse is centered with 
respect to f(a) (Fig. 3). 


: ms 
[ofla-0 = 95 = 5am log 

2 

Ms 


las 


where a = Gio 


ek! ene t + jal + p at a 
So anene ge acee 


VEE 


Fig. 3—Input and output waveforms. 


1An exact closed expression for o;? is given in [7]. 


Ro/V2eWT is plotted in Fig. 4 for three 
values of the input SNR, ®, and includes 
data points obtaimed from measurements 
obtained from the nonlinear device de- 
scribed by Figs. 5 and 6 (next page). Fig. 5 
shows a rather good fit to the true Gaussian 
characteristic assumed in the analysis; the 
circuit of Fig. 6 describes the experimental 
setup and the diode logic circuit arrange- 
ment used to obtain this characteristic 
(V3; = 5 volts). An RC integrator, in this 
application, 1s approximately equivalent to 
the ideal (assumed in the analysis) as long as 
RC > 1/2cW. 


30F = 
x 
20|+ 
ey 
10 
je x = 
8 = 
6 & a) 
SE k 4 
al R=10 
5 x 
Rae ai 
° x x 
Ser nt ° 
| 
2 g 
°3 ve fo} o 
R=4 
x ok tS) 
fo} fo} 
rol 
10 —, : 4 
e ° e fo} 
0.8 . 
E é Big 4 
0.6 A e R=2 
es 
Os | 
0.4 
0.3 i eet [Eo Ei aay i 1 ott tit a wat Litt 
0.03 Or O2 o5 oO 2 5 10 20 50 100 
oe 
x-R=10 a-R=10 
O-R=4 b2rWT= 9,33 SR? Lemwresiio 
e-R=2 


Fig. 4—Normalized output SNR and measured data 


(all data points lowered by 10 per cent). 


192 


I 


NPUT 


IRE TRANSACTIONS 


(VoLTs) 


“04 


Fig. 5—Gaussian nonlinear characteristic. 


WIDE 
BAND 
NOISE 
SOURCE 
(500 KC) 


ADJUSTABLE 
BIAS 


NARROW BAND 
FILTER 


He) PHILBRICK CHOPPER— 
K2-P/ STABILIZED AMPLIFIER 


NON-LINEAR CIRCUIT 


T-86 
OIOCES 


ois 


INTEGRATOR 


0.C. 
VOLTMETER 


Fig. 6—chematic diagram of experimental amplitude filter. 


735 
30F b 5 


20;— 


R=2 R=4 R=10 
TRIANGULAR CHARACTERISTIC ¢ o @ 


TRAPEZOIDAL CHARACTERISTIC x @© & 


Fig. 7—@o0/ /2xWT measured for triangular and 
trapezoidal characteristics. 


CoNCLUSIONS 


In the sense that the amplitude sensitive 
detector (Figs. 2 and 3) is capable of 
separating a pulse of known height from 
other larger or smaller pulses (nonover- 
lapping in time), the circuit can be thought 
of as a band-pass amplitude filter. With 
this in mind we ask the question: what 
amplitude bandwidth maximizes the SNR 
Ro? 

The calculated curves of Fig. 4 show max- 
tang Awl, G) GP hs ex ANS) (@ SS gyi, 
®R = A/o,). Although this was not checked 
for all @, the curves show that in the 
range of 2 S ® S 10, that the best value 
of a, does not depend strongly on the 


amount of noise mixed with the rectangular 
signal pulse. 

Triangular and _ trapezoidal nonlinear 
characteristics were checked experimentally 
and compared with the curves calculated 
for the Gaussian case Fig. 7. Although the 
points quite naturally do not agree with 
the curves, the tendency again is exhibited 
for the maxima to occur at points indicating 
a fixed width-to-signal height ratio. 

It should be emphasized that this 
analysis was performed assuming rec- 
tangular video pulses mixed additively with 
Gaussian noise, and hence, does not apply 
directly to the demodulator example of 
Fig. 2. However, it is reasonable to expect 
that the same -type of behavior would 
result; z.e., that there exists for the IF case 
a fixed ratio of amplitude bandwidth-to- 
signal height ratio for maximum output 
SNR. 

The results do apply directly to the 
coherent radio frequency receiver which 
has available carrier frequency and phase 
information. The known carrier is used as 
the reference signal which is combined with 
the IF in a phase detector pulse demodula- 
tion. In this case, the phase detector output 
could be approximated by rectangular video 


pulses mixed additively with Gaussian 
noise. 

W. M. Watrers 

Res. Div. 

Electronic Communications, Inc. 

Timonium, Md. 

REFERENCES 
]1] 8S. O. Rice, “Mathematical analysis of random 
noise,” Bell Sys. Tech. J., vol. 23, pp. 282-332; 


July, 1944; vol. 24, pp. 46-156; January, 1945. 

{2] W. Davenport and W. Root, ‘‘An Introduction 
to the Theory of Random Signals and Noise,” 
McGraw-Hill Book Co., Inc., New York, N. Y.; 
1958. 

[3] Lawson and Uhlenbeck, ‘Threshold Signals,’ 
Mass. Inst. Tech. Rad. Lab., Ser., McGraw-Hill 
Book Co., Inc., New York, N. Wy vol. 24; 1950, 


ON INFORMATION THEORY 


{4] J. H. Van Vleck and D. Middleton, “‘A theoretica’ 
comparison of visual, aural, and meter reception 
of pulsed signals in the presence of noise,” J, 
Appl. Phys., vol. 17, pp. 940-971; November, 
1946. 

{5] B. A. Varchaver, ‘On the theory of Trans~ 
mitting signals with multiple discrete values,” 
Radiotekhnika (Moscow) ; January, 1959. 

(6] P. M. Woodward, “Probability and Information 
Theory with Applications to Radar,’’? MeGraw- 

New York, N. Y.: 1953. 

{[7] W. M. Waters, + 


Filters,’’ Rad. Lab., The Johns Hopkins Univ., 
Baltimore, Md., Tech. Rept. No. AF-77; May, 
1960. 


A Frequency-Weighted Mean- 
Square Error Criterion* 


The mean-square error criterion is com- 
monly used for the optimization of linear 
filters which perform predicting and smooth- 
ing operations upon random processes. The 
mean-square error may be expressed in the 
following form: 


EMS = i OG.OY: 


the error. The error power spectrum is a 
function of the signal and noise power 
spectra, the amount of delay or prediction | 
desired, and the frequency function of the 
filter. Expression (1) is minimized by 
adjusting the filter function. 

To write f.(\), the following nomonel sua 
is used: 


s(t) = signal. 
n(t) = noise. 


Ensemble averages are represented by 


E{ |. 


Ven(t) = E[s(t)n (t-+7)] = cross-correlation 
function. 


fu) =f dryaa(s) exp [—2mide] 


= cross-spectral density. 


fs(X) = signal power spectrum. 
fa(X) = noise power spectrum. 
k(X\) = filter frequency function. 
\ = frequency in cps. 
D = delay time. 


The signal-plus-noise spectrum is denoted 
by 


g(A) = Fo0A) +P fan(A) =P faslA) + fa). 


It is assumed that f,(\), f,(\), and g(\) have 
the Hopf-Wiener factorization f(\) = 
FO)? = f-O) f20). f-Q) has poles and 
zeros only in the upper half plane. f,(A) has 
poles and zeros only in the lower half plane. 


The input to the filter is s(¢) + n(t).— 


The filter output is denoted by ¢(t). The 


* Received by the PGIT, September 28, 1960; 
revised manuscript received, December 9, 1960. 
The work reported here was supported by Bell 
Telephone Labs., Murray Hill, N. J. 


(1) 


where f.(\) is the power spectral density of 


961 


ror is represented by ¢(t) — s(f — D). 
hus the error power spectrum is 


(A) = g(A) | kA) |? 
— 2Re {[f.() + f,.0) 1k) 
-exp [277A D]} + f(A). (2) 


In* many instances the error power is 
itical for only certain frequency ranges. 
though error power outside those ranges 
ay be large, its effect may not be dis- 
irbing. Multiplying the error power 
yectrum by a frequency function which is 
rge for the critical frequency ranges and 
nall elsewhere will yield a weighted mean- 
juare error. The frequency-weighting 
inction is designated by C(\) and it is 
sumed to be Hopf-Wiener factorable. 
he weighted mean-square error may be 
ritten as 


+00 


EMS,, =| ADFOCO). (8) 


The filter that minimizes EMS,, is 
adily obtained from Wiener’s work.! The 
sult is 


ee Ce ——— 
Se EOI) 


[ dt exp [—271Af] 
0 


| Py Wek) + faalwC_(w) 
pi Aas g.(u) 


-exp [2miu(t — D)]. (4) 


In using a frequency-weighted criterion, 
ire must be taken so that the output of 
le optimal filter will be finite. If C(\) is 
Xr”), n > 0, as X > o, then the output 
ywer of the filter will be finite. Thus, in 
meral, the weighting function may be very 
nall at high frequencies, but it cannot go 
) zero as \ approaches infinity unless a 
1ite output power restriction is explicitly 
aposed upon the system. Should C(\) — 0 
;\ — o and there is no explicit finite 
ywer restriction, then the integral of the 
itput power spectrum of the optimal filter 
ill diverge. 

If the integral diverges, a finite power 
striction may be imposed by means of 
agrange’s method of undetermined multi- 
iers. The filter output is specified by 


P=f all). © 


» is required to be equal to some constant, 
ch as the saturation level of the system. 
he optimal filter must minimize 


+0 


AX -N)CO) 


— co 


TIM TO) 


ty far] ko) 


1N. Wiener, ‘Extrapolation, Interpolation, and 
1oothing of Stationary Time Series,’’ John Wiley 
d Sons, Inc., New York, N. Y., ch. 3; 1957. 


Correspondence 


where y is some constant, not immediately 
specified. To obtain the optimal filter, 
define gy(\) = g(d) [C(A) + y]. From a 
comparison of expressions (2)-(4) with (6), 
the optimal filter is 


1 
(A) 


kw; ¥) = 


| dt exp [—27i\¢] 
0 


i oy, Ua + fr(W]CQ) 
mas J(u) 


-exp [27iu(t — D)]. (7) 


y is adjusted so that (6) is satisfied. 

The level and form of the weighting 
function in the noncritical regions is of 
importance in determining the optimal 
filter. Intuitively, little importance would 
be attached to the precise value of the 
weighting function in such regions as long 
as the function is sufficiently small. How- 
ever, the effect of frequency weighting is to 
cause the filter to attenuate, relative to the 
unweighted criterion, the error power in 
the critical ranges. In doing so, the error 
power is increased in the noncritical ranges. 
The amount of additional error power that 
can be tolerated in the noncritical ranges 
specifies the weighting function in that 
range. Crudely, there is an inverse relation- 
ship between the error power and weighting 
function at each frequency. Thus, the value 
of the weighting function is critical in the 
frequency ranges where the error is non- 
critical. 


ACKNOWLEDGMENT 


The author wishes to thank Prof. P. M. 
Schultheiss for his comments and encourage- 
ment. 


DANIEL 8S. RuCHKIN 
Dunham Lab. of Elec. Engrg. 
Yale University 

New Haven, Conn. 


Information Theory and the 
Separability of Signals with 
Overlapping Spectra* 


The author! has recently applied the 
noiseless coding theorem of information 
theory to obtain a result which is, in some 
respects, a generalization of the sampling 
theorem. The technique employed was to 
replace the probability function of the 
coding theorem by a quantity proportional 
to the spectral function of a random process. 
It may be of some interest to see that the 


* Received by the PGIT, October 19, 1960; 
revised manuscript received, January 24, 1961. 

1. L. Campbell, ‘Minimum coefficient rate for 
stationary random processes,’ Inform. and Control, 
vol. 3, pp. 360-371; December, 1960. 


195 


coding theorem for a noisy channel can be 
adapted in a similar fashion. 

Mathematically speaking, the principal 
theorems of information theory are asymp- 
totic statements about probability measures. 
Thus, the theorems, when interpreted ap- 
propriately, yield statements about any 
functions which have the same properties 
as probability density or distribution func- 
tions. 

Let 2.(t), xe(t), ---, 2(t) be M uncorre- 
lated stationary random processes with 
possibly overlapping spectra. The main 
object of this note is to obtain a measure 
of the amount of overlap of these spectra. 
This measure is obtained by determining 
how many of the products xz,(t) &x,(t2) + °° 
zx,(t,) can he separated by filtering in 
n-dimensional space when n is large. Except 
for its conceptual value in providing a 
measure of overlap of spectra, no applica- 
tion of the result is apparent to the writer. 
However, it is conceivable that signals 
which are functions of 7 time variables will 
be used in the future. 

Let the processes have mean zero, 
variance one, autocorrelation functions 
r(t), and spectral density functions S;(f) 
fork" 12cm ea ubatac: 


Ela.Qadt 7) = 77): (1) 


f nQer dr = 8),  @) 


and 


is Sif) df = 1. (3) 


Nowe Ken EO) GG = i, 2 2°25 i) lo 
n uncorrelated random processes, all having 
the same spectral density S(f). Let wu, 
denote the sequence of integers {ki, ke, --- , 
k, | where 1 < k; < M. There are M” such 
sequences u;. Finally, let 


») t,,) 


= az,’ (t))ap, (te) +> 


yz; ty, be, --° 


tu, (2) 


Then the autocorrelation function of y is 
given by 


PORE my OO Cun ue) 
= Ely@ish, s+: 5 by 
Met by ot atigs rly ea 
= 1u(T1) °° * Tin Tr) : (5) 


Similarly, the spectral density function of 
y is given by 


S(u;; fe ae 2 ta) 
= Sif) Se. (fo) a Si Jayie (6) 


It is the functions y(w;, f, °° , f) rather 
than the individual functions x(t) which, 
for suitably large n, will be separated by 
linear filtering. 


194 


It is now possible to consider a noisy 
semicontinuous channel without memory 
which has the preceding spectral density 
functions as its probability functions. The 
space of input signals will be the set 
of integers 1,2, , M and the space 
of received signals will be the _ real 
line — © <f< o. In view of (8), S:(f) 
can be regarded as a conditional prob- 
ability density function for a received 
signal f, given the input signal k&. Similarly, 
the extension of length of n of this channel 
has for input space the M™” sequences 


ui = {hi ke, - , kp} and for received 
signal space the n-dimensional space 
= << eG Ss 235, my, able 


conditional probability density function is 


Si(fi) Sea(f2) ++ * Sen(fn). 
eto (ke = 2 eM abewiiamon-= 
negative numbers satisfying 


M 
DS eal. (7) 


k=1 
and let S(f) be defined by 


M 


Sif) = dX piSr(f). (8) 


Then the capacity, C, of this channel is 
given by 


C = max [H(X) — H(X |], (9) 


where 


M 
H(X) = — D7 p, log, (10) 


HX | 9 =- > [log 


fee 


and the maximum is taken over all possible 
sets of values of pi, - ++, Ps, which satisfy (7). 
Since there is no special advantage in using 
logarithms to the base two in this problem, 
all logarithms in this paper will be natural 
logarithms. 

The coding theorem then states the fol- 
lowing:? Let H and e« be two numbers 
satisfyng 0 < H < C and e« > 0. Then 


(2W)* for 


0 otherwise, 


2A, Feinstein, “Foundations of Information 
eory,’’ McGraw-Hill Book Co., Inc., New York, 
7.; 1958. 


Th 
Nize Weel 


N. 


IRE TRANSACTIONS ON INFORMATION THEORY 


there exists a positive constant mo such that 
in every extension of length n > no there 
exists a set uw, Uo, °*° , Un, N > er#, to 
each of which is associated a set A; 
(Ca —anles , N) such that P(A,|u;) 
> 1 — e. Moreover, the sets A; are disjoint. 
Here the sets uw; are N sequences from 
among the M” sequences {ki, ke, *** , kn}. 
Each set A; is a set of points in the n-dimen- 
sional space — © <f; < ~ (7 = 1,2,°°-, 
n). The probability P(A,|w;) is the con- 
ditional probability of the received signal 
lying in A; when the input is the sequence w;. 

The coding theorem must now be trans- 
lated back to spectral terminology. The 
theorem states that there are N sets of 
integers {ki, kx, --: , kn} and N disjoint sets 
A; in n-dimensional space such that 


ip ae | Self) +++ Sia(fn) 


cf. semkelfpeee las ee. 


Now the integrand is just the spectral 
density function of the function y(u;; 
hh, -°: , t,) defined by (4). Thus (12) says 
that a fraction 1 — e of the power in 
yur; th, , tn) falls in the set A; for 
7 = 1,2, --- , N. Moreover, no more than 
a fraction ¢ of the power from any of the 
other functions y(u,; 4, ++: , tr) falls in A; 
Lon jess 2 andi jit ls? eer enererore, 
since e was arbitrary, and since the sets A; 
are disjoint, it is possible to obtain arbi- 
trarily good separation of JN _ signals 
y(ui3 h, «++ , t) by a “filtering” process in 
n-dimensional frequency space. 

Somewhat more roughly, there are 

e”© distinguishable products xz, (4) 
2, (tr). Since the number of possible 
products is M”, it is natural to take a 
geometric mean and say that there are e° 
distinguishable random processes in the 
original set 2(¢), ---+ , xq,(¢). Of course, the 
term “‘distinguishable’? must be understood 
only in the sense used here. 

In the trivial case that the spectra of the 
x(t) are nonoverlapping, 7.e., that S;(f) 
Si(f) = 0 for all f whenever 7 # k, it is 
easily shown from information-theoretic 
considerations that H(X|Q) = O and 
hence that C = log M. In this case there 
are e° = M distinguishable functions, in 
agreement with simpler notions of dis- 
tinguishability. Similarly, if all the spectra 
are identical so that S:(f) = S(f), it follows 
that C = 0. Thus there is only one dis- 
tinguishable function. 

A more interesting example is provided 
by two partially overlapping rectangular 
spectra. Let 


(12) 


(ete 22) (13) 


Jul 


and let fi < fs < fi + W. A simple calcula 
tion shows that 


(Maa 


W ag 


loorZ;, 


and the number of distinguishable signals 1 


(15) 


As f2 increases from fi; to fi + W, e© in- 
creases from one to two. 


¢ (fe-fi)/W 
Go =o ites: 


L. L. CaAmPpBEL 

Essex College 

Assumption University of Windsor 
Windsor, Ontario, Canada. 


On the Approximation to Likelihood | 
Ratio Detectors Laws (The 
Threshold Case)* 


In a recent note! Bussgang and Mudgett 
emphasize the fact that for the case of a 
sine wave in noise it is not sufficient to 
approximate the logarithm of the likelihood © 
ratio by using only one term in the expan- 
sion of log Jo(n7), but that a third term is 
required in order that the expected value 
of the detector output will converge with 
respect to the null hypothesis. In the case 
of a sequential test two terms in the like- 
lihood ratio lead to an average sample 
number which diverges at the null hypo- 
thesis. I agree completely with the authors 
that this point, although emphasized in- 
other publications, still is not recognized 
by many. 

Blasbalg? has shown that the use of the 
first term in the approximation to the 
logarithm of the likelihood ratio always 
leads to an expected value which is zero 
under the null hypothesis (at least where 
the indicated expansion is valid), and 
hence a divergent ASN Function at this 
point. We will prove this again. 

Assume that we are testing the hypothesis 
Hy that the 6 < 6» against the alternative 
hypothesis H; that 6 > 6:(@ < 6,). Then 
when 6; — 60 < 1, the threshold case, the 
log-likelihood ratio for a single sample is 


og pets 1) Ee oe 1) 


© P(r, 8)  \P(r, 60) 


i (Pood i) 4 a 


* Received by the PGIT, November 3, 1960. 

1J. J. Bussgang and W. L. Mudgett, “A note of 
caution on the square-law approximation to an 
optimum detector,” IRE Trans. on INFORMATION 
TueEory, vol. IT—6, (Correspondence), pp. 504-505; 
September, 1960. 

2 H. Blasbalg, ‘‘The sequential detection of a sine- 
wave carrier of arbitrary duty ratio in Gaussian 
noise,” IRE Trans. on INFORMATION THEORY, vol. 
IT-3, pp. 248-256; December, 1957. 


1 


we consider only the first two terms in 
expansion we have for the detector law 


st ae 


2\PG, ) ~ 1) » 2) 


Correspondence 


195 


then the expected value of the first term 
in (8) is 


0 
Aé ie age 6) ae ad 


a pte 
ui sof 2 lp P(r, 8) dr =| 


2). = fee bla 1) ‘P(r, 00) dr — ioe PG 05) ar 


IAG 


= i the LAG, 6;) ian 
2 Ie | Hed 90) 


= Lye) LP iad 
ie ie | 


| Pe, 6) dr 


6,) f= TAG CD) 


2 
PO, 6) | P50) di 


9 


= sal z, | Pe 6;) — PGs at (3) 


= 


nce, the expected value of the first term 
nishes under the null hypothesis @ = 65. 
Let us now obtain these results in a more 


ognizable form. Let 6: = 6) + A@ 
eer AGe—le When: 
A + Aé) — eG 60) 
0 
+ = P(r, 6) Nine eee 
06 §=00 


we include only the first two terms shown 
1 divide through by P(r, 60), we have 


r, 9 + Aé@) — P(r, 4) 


EAPO) 
= see \) - & 
bstituting into (2) for the detector yields 
! See EGE. 

— 5 Pe, : oy le Per, ay | © 


Pers Ty) 


It should also be clear that 


Z= A=, log PCr, »| 


a ; 7) 


If we now take the expected value of (6), 
we have 


il aly @ 
ai ao 2 log P(r, 


Ey, (2) = A@ 
“ied (Das AG’ 
ais | Bre 9) al EG: 
i “|2 
|S, PC, 8 | 
6=86 
io i jaar ina | PO 65) ar: 


(8) 


Now, if we assume that the order of 
differentiation and integration can be inter- 
changed as will almost always be the case, 


Ss 0: (9) 
Then, from (7) and (9) we have 


_ (6 ites Baa 
2 


iE log P(r, 8) | a0) 


where AO = 6; — 6. Eq. (10) is a well- 
known result;? it is the variance of a 
maximum likelihood ratio estimate. 

In the case of the log Jo(nr) detector 
when we perform the power series approxi- 
mation and include the fourth-order term 
to obtain convergence for the expected 
value at 0 = 6 we are in fact computing 
(6) for the detector law and (10) for its 
average output. Our conclusion is, there- 
fore, that for threshold parameter detection 
6: — 0 = A@é < 1, the detector at least 
must have the first two terms shown. 
(Although the significance of this result 
comes to our attention in sequential detec- 
tion, we must conjecture that it is Just as 
significant for fixed sample size tests since 
the results derived represent fundamental 


Ey,(2) = Ey, 


properties of likelihood ratio tests in 
general. ) 

HERMAN BLASBALG 

Ism-Federal Systems Div. 

Communications 

Bethesda, Md. 

Formerly with 

Electronic Communications 

Research Div. 

Timonium, Md. 

3 A. M, Mood, “Introduction to the Theory of 

Statistics,’ McGraw-Hill Book Co., Inc., New 


York, N. Y.: 1950. 


196 
Contributors 


Phillip Bello, (S ’52—A ’55) for a bio- 
graphy, please see page 55 of the January, 
1961, issue of these TRANSACTIONS. 

7 


*e 


James A. McFadden was born in San 
Juan, Puerto Rico, on December 11, 1924. 
He received the B.S.E. degree in 1945 in 
mathematics and a year later in electrical 
engineering, the M.S. degree in 1947 in 
physics, and the Ph.D. degree in 1951, also 
in physics, all from the University of 
Michigan, Ann Arbor. 

From 1951 to 1957, he was a physicist at 
the Naval Ordnance Laboratory, Silver 
Spring, Md., working in fluid dynamics, 
acoustics, applied probability, and stochas- 
tic processes. Since 1957, he has been an 
associate professor of electrical engineering 
at Purdue University, Lafayette, Ind. 
During the summer of 1958 and 1959, he 
worked for Lincoln Laboratory, Lexington, 
Mass., and in 1960, at Bell Telephone 
Laboratories, Murray Hill, N. J. 

Dr. McFadden is a member of the 
Society for Industrial and Applied Mathe- 
matics, the American Society for Engineer- 
ing Education, Tau Beta Pi, Phi Kappa 
Phi, and Sigma Xi. 


Od 


George L. Turin (M ’56—SM 759) was 
born in New York, N. Y. on January 27, 
1930. He received the B.S. and 8.M. degrees 
from the Massachusetts Institute of Tech- 
nology, Cambridge, Mass., in 1952, after 
completing the cooperative course in elec- 
trical engineering in association with Phileo 
Corporation. In the summer of 1952 he was 
an M. I. T. Overseas Fellow at Marconi’s 
Wireless Telegraph Company in England. 
From 1952-1956 he worked at M. I. T.’s 
Lincoln Laboratory, Lexington, Mass., in 
the field of statistical communication 
theory, first as a staff member, and later as 
a research assistant while completing his 
doctoral studies. During this latter period 
he was also a consultant to the firm of 


IRE TRANSACTIONS ON INFORMATION THEORY 


Edgerton, Germeshausen and Grier. He 
received the D.Sc. degree in electrical engi- 
neering from M. I. T. in 1956. 

From 1956-1960, he was engaged in 
communication and radar research studies 
at Hughes Aircraft Company. Culver City, 
Calif. During this time, he also taught part- 
time at the University of Southern Cali- 
fornia, Los Angeles, and at the University 
of California at Los Angeles. He is presently 
visiting lecturer in electrical engineering at 
the University of California, Berkeley, 
while on leave of absence from Hughes. 

Dr. Turin is a member of Eta Kappa Nu, 
Tau Beta Pi, and Sigma Xi. He is also 
Vice Chairman of the Administrative Com- 
mittee of the IRE Professional Group on 
Information Theory and a member of Com- 
mission 6.1 and 6.2 of the U. 8. National 
Committee of URSI. 


2, 
~¢ 


Solomon W. Golomb was born in Balti- 
more, Md., on May 31, 1932. He received 
the B.A. degree in mathematics from The 
Johns Hopkins University, Baltimore, in 
1951, and the M.A. degree from Harvard 
University, Cambridge, Mass., in 1953. 
After spending the academic year 1955-1956 
in Oslo, Norway, on a Fulbright Grant, he 
received the Ph.D. degree from Harvard in 
1957. 

He joined the staff of the Jet Propulsion 
Laboratory at the California Institute of 
Technology, Pasadena, in 1956, and is 
presently the Assistant Section Chief of its 
Communication Systems Research Section. 
He has been concerned primarily with the 
applications of discrete mathematics to 
coding and communications. 

Dr. Golomb is a member of Phi Beta 
Kappa and Sigma Xi. 


*, 
% 


Wayne R. Cowell was born in Wakefield, 
Kans., on June 27, 1926. He received the 
B.S. and M.S. degrees in mathematics from 
Kansas State University, Manhattan, in 


July 


1948 and 1950, respectively, and the Ph.D, 
degree in mathematics from the University 
of Wisconsin, Madison, in 1954. 

From 1954 to 1959 he was assistant 
professor of mathematics at Montana State 
University, Missoula, where his activities 
included teaching and research in abstract 


algebra and projective geometry. Since 
1959, he has been with Bell Telephone 
Laboratories, Inc., Murray Hill, N. J. 
where he is engaged in basic studies in data 
communication. His present interests are 
primarily in coding theory and error-control 
systems. 

Dr. Cowell is a member of the American 
Mathematical Society, the Mathematical 
Association of America, Sigma Xi, Pi Mu 
Epsilon, and Phi Kappa Phi. 


° 
SOC] 


D. C. Youla (SM ’59) was born 
Brooklyn, N. Y., on October 17, 1925. He 
received the B.E.E. degree from the College 
of the City of New York in 1947, and the 
M.S. degree from New York University in 
1950. 

From 1947 to 1949 he was employed as 
an instructor in the Department of Elec- 
trical Engineering at C. C. N. Y. He 
attended the N. Y. U. Graduate School of 
Mathematics as a full-time student from 
1948 to 1950, and for the next two years 
was at Fort Monmouth, N. J., and Brook- 
lyn Naval Shipyard working on problems 


"9 


of UHF and microphonics. In 1952 he 


joined the communication group at the Jet 
Propulsion Laboratories, Pasadena, Calif., 
and participated in the design of antijam 


radio links for guided missiles. In 1955 he 


began his present association with the 


Microwave Research Institute, Polytechnic 


Institute of Brooklyn, Brooklyn, N. Y., 


where he engaged in the practical and 
theoretical study of codes for combating 


noise and improving efficiency. He is now a 


research associate professor of electrical 
engineering, working actively on network 
synthesis problems, stability of time- 


variable systems, solid-state devices and 


n-port filtering. 


61 IRE TRANSACTIONS ON INFORMATION THEORY 


.bstracts 


197 


This Section of the issue is devoted to abstracts of material which may be of interest to PGIT members. Sources are 
Government, Industrial and University reports, and books and journals published outside of the United States. Readers 
familiar with material of this nature which is suitable for abstracting are requested to communicate the pertinent informa- 


tion to one of the Editors or Correspondents listed below. 


Editors 


R. A. Epstein 
Seneca 29, 4°, 1¢ 
Barcelona, Spain 


G. L. Turin 

Dept. of Electrical Engineering 
University of California 
Berkeley 4, Calif. 


Correspondents 


8. V. C. Aiya 
Indian Institute of Science 
Bangalore 12, India 


D. A. Bell 
University of Birmingham 
Birmingham, England 


G. Francini 

Issh es OR 

Viale di Trastavere, 189 
Rome, Italy 


H. Mine 

Defense Academy 
Obaradai, Yokosuka 
Japan 


>quency-Time Transposition for the Measurement of an Unknown 
quency, I—R. H. Baumann (in French). (Ann. de Radioélectricité, 
|. 15, pp. 305-330; October, 1960.) 


A new method for the determination of the Doppler frequency of 
inusoidal signal in the presence of noise is described. The system 
isists of a delay line in a closed-loop arrangement such that the 
ut signal, whose frequency is to be determined, is allowed to 
culate several times in the closed-loop system. At each recircula- 
n of the signals in the loop, the frequency of the signals are 
fted by an amount equal to the reciprocal of the line delay before 
y are added to the input signals. These circulating sinusoidal 
nals are thereby transformed into impulses, whose time shift is a 
asure of the unknown Doppler frequency. In Part I of this in- 
stigation, an idealized system is treated theoretically, and results 
» also given for an experimental system. 


versible Stationary Random Functions—A. Blanc-Lapierre (in 
anch). (Comptes rend. acad. sci., vol. 251, pp. 1957-1959; Nevem- 
-7, 1960.) 


[The author gives various properties of stationary random func- 
ns whose moments E[X(t,) --: X(¢t,)] are invariant when the 
tants 4, ---, t, are replaced by t’, ---, t,’, respectively symmetric 
the former around an arbitrary instant tp. 


Correlator Employing Hall Multipliers Applied to the Analysis 
Vocoder Control Signals—A. R. Billings and D. J. Lloyd (in 
glish). (Proc. IEE, vol. 107, pt. B, p. 435; September 1960.) 


[he authors define a perzodic weighted correlation function which 
1 be obtained when 1) an infinite stationary time function is 
laced by the cyclic repetition of a time-function of finite length, 
1 2) the integrators used in the correlator are imperfect. (The 
ter is accounted for by a weighting function.) The correlator uses 
Hall effect for multiplication, and signal frequencies of the order 
25 cps or less are recorded on magnetic tape in the form of ampli- 
le modulation of a 2-ke carrier. Auto and cross correlograms of 
control signals in a 10-channel vocoder are obtained. Preliminary 
ults show from auto correlation that the power spectrum of a 
trol signal occupies a bandwidth considerably less than the 25 
commonly considered to be necessary, and from cross correlation 
t there is still considerable redundancy in vocoder signals. 


L. L. Campbell 
Essex College 
Windsor, Ontario 


C. H. Grandjean 
Laboratoire Central de 
Télécommunications 


Canada Paris 7e, France 
C. Rajski F. L. H. M. Stumpers 
The Technical University N. V. Philips 

of Warsaw Gloeilampefabrieken 


Research Laboratories 
Eindhoven, Netherlands 


Warsaw, Poland 


A General Formulation of the Fundamental Theorem of Shannon 
in the Theory of Information—R. L. Dobrushin (in Russian). 
(Uspekhi Matemat. Nauk, vol. 14, pp. 3-104; November-December, 
1959.) 


In a valuable book by Shannon and Weaver, the fundamental 
concepts of a theory of information were introduced, and the funda- 
mental theorem of this theory was obtained at a physical level of 
rigor. After this the works of MacMillan and Khinchin appeared, in 
which a strict interpretation of the Shannon theorem was given in 
the case of a discrete stationary source and channel under the 
requirement of strict coincidence of the information being received 
and that being transmitted. In this, Khinchin essentially based 
himself on the ideas in the work of Feinstein. The works of Khinchin 
were extended in an application to processes with continuous 
multiple states by Rosenblat-Roth and more particularly by Perez. 
Rosenblat-Roth also indicated the possibility of extending the 
theory to nonstationary processes. Also of interest are the recent 
works of Wolfowitz and of Blackwell, Breiman and Thomasian. 

Kolmogorov, in the form of the organization of the problem, led 
the way to a highly general and mathematically rigorous treatment 
of the Shannon theorem. The aim of the present work is to give a 
proof of Shannon’s theorem according to Kolmogorov’s interpreta- 
tion, under sufficiently general conditions. These general conditions 
are formulated with the help of the concept of information density, 
introduced into the mathematical literature by Gel’fand and Yaglom, 
and by Perez. The frequently used less general concept of informa- 
tion stability is evoked, along with some ideas expressed in words 
by the two first-mentioned authors. 


On the Concept of the Instantaneous Frequency of a Signal—R. 
Fortet (in French). (Cables et Transmission, vol. 14, pp. 60-73; 
January, 1960.) 


The paper consists of three parts: the first one presents some 
general remarks on the definition of the instantaneous frequency of 
a signal, as derived from the analytic signal concept, and on the 
corresponding calculation of its value. The second one relates to 
filters considered as transmitters of infinitely short pulses; its object 
is to compare the response of the filter to such a pulse to that to a 
given input signal. In the third part, the author develops and dis- 
cusses calculation methods applicable to filters of the minimum- 
phase type, z.e., conforming the Bayard-Bode relation. This study 
is not complete: only certain basic elements are given, more partic- 
ularly a theorem according to which a network with a filtering 
characteristic symmetrical with respect to a central frequency does 
not cause any instantaneous frequency distortion. 


198 IRE TRANSACTIONS ON 
On the Determination of the Amount of Information Concerning 
a Random Function Supplied by Another Similar Function—I. M. 
Gelfand and A. M. Yaglom (in Russian). (Uspekhi Matemat. 
Nauk, vol. 12, pp. 3-52; January, 1959.) 


The paper is divided into two chapters and is devoted to the main 
problem of information theory, that of finding the amount of infor- 
mation [(£ 7) of one r andom object £ about another one 7. In the 
first chapter the amount of information is defined and its properties 
are discussed if £ and 7 are of a very general nature, e.g., vectors, 
functions or generalized functions. The case of vectors is treated in 
detail starting from the bases given in Appendix 7 of Shannon and 
Weaver’s book. Under rather general set-theoretical assumptions, 
the theorem is proved that the problem of finding the amount of 
information may be reduced to that of computation of the Lebesgue- 
Stieltjes integral 


(da dy) 
ff et dx dy) Rea Cay 


The remainder of the first chapter contains the definition of the 
amount of information for a wide class of generalized random 
functions as well as the discussion of its properties. 

The second chapter begins with the determination of amount of 
information for generalized Gaussian random functions. Next, a very 
elegant formula is derived for the case when é and 7 are vectors. Let 
& = (&, &, +°: , &), and » = (xy, Exse, «°° , Expr) denote multi- 
variate Gaussian variables with second central moments 
mi = Ele; — ECE; )lE; — H(E;)]. Let A = det mi; for 1 <7i,j < k; 
Jey CR oay WOR TG SO) SS 18 Se A Ghiol (0) olen ay soe 
1S, eS oe Weithen 


LD 


provided C # 0. This restriction, however, may be circumvented 
by a suitably chosen linear transformation of coordinates in the 
space of vectors € and 7. 


The Diffused Radiation due to Distribution Errors—J. Guittet 
(in French). (Rev. Tech. C.F.T. H., no. 33, pp. 29-57; October, 1960.) 


The imprecision of fabrication of an antenna affects the charac- 
teristics of its radiation pattern. The author studies this effect in 
the case of an antenna with large gain and a low level of secondary 
lobes. 


Orthogonal Codes—H. F. Harmuth (in English). (Proc. TEE, vol. 
107, pt. C, p. 242; September, 1960.) 


An orthogonal code m elements long is one of which the characters 
may be positive and negative directions of m orthogonal vectors 
in m-dimensional space. An example is a set of 32 binary characters 
of 16-digit length, having mutual distances of either 8 or 16 digits. 
An equivalent orthogonal code can also be constructed from a set 
of sine and cosine functions of limited duration having 1 to 8 cycles 
in the character interval; 32 characters result from taking 8 sine 
plus 8 cosine, doubled for plus and minus. The frequency spectrum 
of each character is then a sin z/x type of function. Alternatively 
one could postulate spectra which have sinusoidal distributions 
within a prescribed bandwidth, and the time functions would then 
be of sin x/x type, and still orthogonal. Reception would be by 
synchronous demodulator, and square waves would be transmitted 
as synchronizing signals. The signal/noise advantage of a 16-element 
32-character orthogonal code is calculated, relative to a 5-element 
32-character teletype code. 


Optimum Combination of Pulse Shape and Filter to Produce a 
Signal Peak upon a Noise Background—H. 8. Heaps (in English). 
(IEE Monograph No. 407E; October, 1960.) 


The author seeks the optimum shape of transmitted pulse, V,(¢), 
and the transfer function H(w) of the best linear filter for detecting 
this pulse against a background of noise having power spectrum 
| o(w) |? and after transmission through a system which has a transfer 
function (due both to medium and to transducers) 7(w). The pulse 
is regarded as made up of nm samples spaced by 7 over a total pulse 
duration d. Examples are quoted for a noise/transmission relation- 


INFORMATION THEORY Jul 


ship of the ne | o(w) |2/| T(w) |? = exp (—k%w?); n = 3 or 10 
and d/2k = 6, 9 or 15. For n = 3 the optimum shape is found to b 

a cycle of ancilletion roughly equivalent to three pulses in th 

sequence positive, negative, positive. The ratio of squared pea 

signal in the output to mean square noise in the output, modified 
in accordance with the forms of o(w) and 7(w), has a maximum 
value denoted by Xo. For a fixed pulse length d, the optimum valu 

of \» increases very rapidly as n increases. 


The Transmission of Discrete Information through Periodic and 
Almost-Periodic Channels—K. Jacobs (in German). (Math. Ann., 
vol. 137, pp. 125-135; 1959.) 


This mathematical paper is an extension of Khinchin’s proofs o 
the theorems of MacMillan, Feinstein and Shannon I and II t 
almost-periodic (in particular, periodic) channels. The ergodic 
capacity of such channels is the same as that of its stationa 
average. An almost-periodic source has the same entropy as its 
stationary average. Feinstein’s theorem and Shannon I are valid for 
almost-periodic channels. This may have an application in satellite 
communication. 


Adaptive Waveform Recognition—C. V. Jakowatz, et al. (in English). 
GE Res. Lab., Schenectady, N. Y., Rept. No. 60-RL-2435E, May, 
1960; Rept. No. 60-RL-2353E (Revised), September, 1960.) 


This report describes an adaptive waveform recognition system 
capable of picking out a randomly occurring signal perturbed by 
additive noise. This system was constructed in the form of a self- 
adaptive matched filter that learns with experience to adjust its 
impulse response so that it automatically forms the inverse for 
the signal mentioned above. Furthermore, it has the capability of 
portraying its concept of what it thinks the signal is. Provided the 
conditions for initiating convergence are met with infinite ex- 
perience, 7.e., time, the adaptive filter will approach the ideal 
matched filter. In practice, infinite experience cannot be realizeall 
and the adaptive filter is inferior to a predesigned matched filter. 
Its utility and application are where a priori design information: 
is not available or is only partially available. 

Two methods of operating an adaptive filter of the above types 
are theoretically investigated. In the priming method, the first 
approximation of the filter, z.e., the first step in learning, is the 
adjustment of the filter so that it is the matched filter of a random 
sample of its input. If experience indicates that there was no signal 
in that random sample, then the filter will reject it and make another 
trial. In nonpriming operation, the first approximation to the 
desired matched filter is continuously changing in a random fashion. 
Convergence begins when the changing filter reaches a state in 
which the signal is a component of the matched filter, 

An adaptive filter has been constructed consisting of a 10-tap 
delay line with a 500-cps cutoff frequency. The gain of any tap is 
determined by the previous experience of the filter. The memory 
associated with experience consists of condensers. The arithmetic 
operations associated with that memory are based upon relay 
switches. The constructed machine operates in the nonpriming 
mode. The filter readjusts itself whenever the correlation between 
the matched filter and the incoming waveform exceeds a given 
threshold. Both filter properties and threshold value are functions 
of the past experience of the filter. 

Performance curves on the filter are presented and indicate 
performance as a function of Woodward’s F and filter parameters. 
In general, convergence is rapidly initiated for values of R greater 
than about 10. It is difficult for a human observer to detect visually 
or acoustically randomly occurring undefined events with this 
value of R. 


An Extension of N. Wiener’s Prediction Theory—J. Kondo (in 
English). (J. Operations Res. Soc. Japan, vol. 2, pp. 124-129; 
January, 1960.) 


N. Wiener has introduced the so-called Wiener-Hopf integral 
equation of a predictor A(t) for a continuous time series f(t) in his 
prediction theory. It is noted, however, that we have to use the 
factorization technique to find K(¢) from this equation. It is some- 
times very difficult to carry out this technique when the auto- 
correlation function of f(¢) is not expressed in a simple form. 


61 


The present paper deals with the prediction of a time series f(£) 
th another time series g(t), by taking account of the cross correla- 
on between these two time series. In this case, we have a singular 
tegral equation of K(¢), and can obtain the solution of K(t) in 
neral, without applying the factorization technique. When we 
sume f(t) = g(t), the result will reduce to the Wiener case. There- 
re, this method includes the Wiener prediction theory as a special 
se. 


1e Output Spectral Density of a Detector Operating on a FM CW 
adar Signal in the Presence of Band-Limited White Noise— 
Lait and A. J. Hymans (in English). (IEE Monograph No. 412B; 
tober, 1960.) 


In a radar system in which the returned echo is made to beat 
th transmitted signal, the output from the detector will include 
e following components: 1) the desired beat note; 2) the normally 
tected noise; and 3) a random signal produced by interaction 
‘tween the FM wave and the noise. It is assumed that detector 
itput results from interactions between reference signal and noise 
id between reference signal and echo, but that interaction between 
ho and noise is negligible. Using the detector model proposed by 
auwson and Uhlenbeck, noise spectral distributions are deduced 
rth for the quadratic detector and for the linear detector with 
qall and large SNR’s. In general, the predetector bandwidth should 
»no greater than is needed to pass the echo and reference signal; 
it for very small targets at short range, an increase in bandwidth 
ay move some of the noise power away from the part of the 
ectrum occupied by the signal. In every case, the postdetector 
indwidth should be kept as small as is consistent with the required 
formation rate. 


ae Indeterminacies of Measurements Using Pulses of Coherent 
ectromagnetic Energy—R. Madden (in English). (IEE Mono- 
aph No. 417E; November, 1960.) 


An idealized radar transmitter emits a single pulse of wavelength 
_and duration 7 at time t = t. The associated receiving system 
mprises a paraboloid antenna of diameter 2a; an array of signal 
stectors in the focal plane of the antenna; associated with each 
tector, a bank of filters for determining the spectral analysis of 
e echo signal; and associated with each filter, a clock for deter- 
ining the time elapsed between the transmission of the pulse and 
.e arrival of a particular frequency component in the echo. The 
gular resolving power corresponding to an antenna of this aperture 
Ag = d/2a, but in order to achieve this, the whole aperature must 
» illuminated simultaneously. In general, this requires a pulse 
ngth + such that cr > 2a. However, the range resolution AR is 
order 4c7, so that Ag-AR =~ X;/2. Similarly the accuracy with 
hich radial velocity V, can be found (by Doppler effect) depends 
1 the mean frequency and the duration of the pulse, and it is 
own that AR- AV, ~ X.c/4. Tangential velocity causes the signal 
/move from one detector to the next, but unless the signal dwells 
r the full time 7 on each detector, the accuracy of determination 
radial velocity will suffer. Similarly, radial acceleration causes 
e signal frequency to change and so limits the accuracy of deter- 
ination of radial velocity. The use of nonsimultaneous measure- 
ents (e.g., pulse trains or modulated-wave systems) produces 
nbiguities such as Ramp: Vamb = AC/4. 


etermination of the Structure of a Majority-Decision Element by 
e Method of Linear Programming—S. Muroga, et al. (in Japanese). 
. Inst. Elec. Commun. Engrgs. Japan, vol. 43, pp. 1408-1416; 
ecember, 1960.) 


A majority-decision element is an element in which a finite number 
inputs, having weights (coupling numbers), are coupled with one 
itput. The output value is one or zero, and is decided by the 
ajority decision depending on the coupling numbers. The number 
Boolean functions which can be realized by a single majority- 
cision element is rather small. Thus, it is necessary to determine 
e category of such Boolean functions (majority-decision function). 
sing the method of linear programming, we have developed a 
iterion concerning whether a given Boolean function can or cannot 
realized by a single majority-decision element, and this method 
termines also the most economical structure (coupling numbers 


Abstracts 


199 


and threshold) of a majority-decision element realizing the function. 
In the formulation of linear programming, the number of constraints 
is considerably reduced by the properties of majority-decision 
function. A table is given of majority-decision functions of five or 
less variables and the structure of majority-decision elements; these 
are calculated by the above method. 


On the Noise Figure of Low-Gain Stages of Amplifiers and its 
Measurements—T. Namekawa (in Japanese). (J. Inst. Elec. 
Commun. Engrs. Japan, vol. 43, pp. 1329-1334; November, 1960.) 


The theory of noise figures has been known for many years. In 
many cases, the first stages of amplifiers are designed to get better 
noise performance, and the noise figure has been used as a criterion. 
It is not sufficient to take noise figure only, when the power gains 
are comparatively small. Power gain must be taken into considera- 
tion besides the noise figure. The author has developed here a 
definition “Iterative Noise Figure” F;: 


F—1 
fierce ea, 
Peed 
G 
This is useful for determining the noise performance of low-gain 
first stages, and the main part of Ff’; is same as the ‘“‘Noise Measure”’ 
M which has been developed by Haus and Adler. The methods of 
measuring the Iterative Noise Figure or the Noise Measure are 


discussed. It is possible to determine the values of /; or M by direct 
reading from the measurement of one stage under test. 


Prediction Theory and Dynamic Programming, II—T. Odanaka 
(in English). (J. Operations Res. Soc. Japan, vol. 3, pp. 88-92; 
October, 1960.) 


The theory of prediction given in this paper is an extension of 
the previous paper presented at the International Statistical Insti- 
tute, 32nd Session, 34, 1959. In the previous paper, we were con- 
cerned with the problem of separating a message from a signal, 
the message being represented by a discrete time sequence described 
statistically by a given autocorrelation function; and the signal 
being represented by still another sequence with a given auto- 
correlation function and a cross-correlation function with respect 
to the message. This paper presents some application of the func- 
tional equation technique of Dynamic Programming to the numerical 
method of this extended prediction theory. 


Some Remarks on the Capacity of a Communication Channel— 
M. Sakaguchi (in English). (J. Operations Res. Soc. Japan, vol. 3, 
pp. 124-132; January, 1961.) 


The transmission of information requires the presence of a source 
of information coupled with an appropriate channel. An information 
system is described in terms of joint probabilities of inputs and 
outputs, and a channel is defined by its transition probabilities. 
The author discusses a close connection between the capacity 
theorem and the matching theorem. This paper presents a general 
theorem which includes these two theorems as the two special 
cases. An interpretation of capacity is given by introducing cost 
considerations into the information system. | jk 


Shift Registers Generating Maximum-Length Sequences—-P. H. R. 
Scholefield (in English). (Hlectronic Tech., vol. 37, p. 389; October, 
1960.) 


A cyclic sequence which contains all possible combinations of 
binary digits may be used as a source of pseudo-random numbers, 
or for the generation of digital codes. Such a sequence of s digits 
may be described by a recurrence formula ad) = CiQn1 + C2Gn-2 + 
-** CsAn—s, and it may be generated by means of a shift register with 
back connections from various stages to the imput. In a binary 
system c = O orl. F(z) = 1 — Dv cia" is called the characteristic 
polynomial and the condition for the register to generate a full 
sequence of length n is that F(a) contains no factors and does not 
divide into z**! for any k less than 2” — 1. It is then shown to be 
advantageous to replace a single register with multiple taps gener- 


200 


ating the feedback to the input by several registers of the same 
aggregate length. Taps on the first partial register are combined 
to give an input to the second, ete., and taps on the last provide 
an input to the first. Examples show the saving in logical circuits 
which can be secured by this modification. 


On the Sampling Theorem of the Second Kind—H. Wolter (in 
German). (Arch. Blekt. Ubertragung, vol. 13, pp. 477-484; 1959.) 


In an earlier paper, the author has shown that if an object function 
of finite extension @ is imaged by an optical information channel 
of the aperture 2 W, one can get more than 2 WG information data 
from it. In this paper, he asks whether it is possible to get more 
than 2 WG’ information data from an image of extension G’ and again 
gives a positive answer. He does not deny the Whittaker Interpola- 
tion Theorem, but shows that it is easy to derive wrong conclusions 
from it. 


The following papers were published singly by the Professional Group on Information Theory (I) and the Professional 
Group on Automata and Automatic Control (A) of the Institute of Electrical Communication Engineers of Japan, 2-8, 
Fujimicho, Chiyodaku, Tokyo, Japan. All are in Japanese, except as noted; English abstracts are gwen when available. 


Topological Considerations in Information Recognition (I; October 
21, 1960)—H. Enomoto. 


In this paper, the topological characteristics of a connecting 
relation of information are considered. It is proved that the space 
having the same characteristics as the connecting relation of informa- 
tion is topologically homeomorphic with a multihole torus. Some 
topological considerations are applied to the information recognition 
process. 


A Few Considerations on Pattern Recognition (A; December 8, 
1960)—Y. Iijima. 


A Computational Method for Speech Recognition (A; September 8, 
1960)—S. Inomata (in English). 


A computational program for stationary vowel recognition is 
proposed; it is called SNCS (Speech Normalizing and Comparing 
Scheme). In this, the first gestalt properties of the input speech, 
such as amplitude, time origin, time scale factor and phase distortion, 
are normalized by a normalizing program composed of Fourier and 
S-transforms. ‘Active recognition” of the input speech is done by 
the comparison of the normalized input speech with similarly 
normalized kernel speech generated by a speech-generating program. 
In the course of this comparison operation, the second gestalt 
properties of speech, such as differences of individual pronunciation 
and of male and female speech, are completely normalized. A special 
program, developed to normalize the stationary vowel with respect 
to its duration, is also incorporated. The distinctive features of 
this speech recognition program are its “developing”? and 
“statistical” learning abilities. 


Generation of Speech by a Digital Computer (1; September 30, 
1960)—S. Inomata, e¢ al. 


A digital computer has been successfully programmed to generate 
five stationary Japanese vowels. Speech waves have been generated 
by the approximate evaluation of the fahltung-type integral de- 
scribing the human speech generating process by means of the simple 
weighted-sum method. Consideration is also given to the extension 
of this program to both consonants and nonstationary vowels. 


Modification of a Speech-Normalizing Algorithm (I; October 21, 
1960)—S. Inomata. 


Four modifications of the speech-normalizing algorithm involved 
in the author’s SNCS scheme (see above) have been proposed; 
these can be executed on a digital computer somewhat faster than 
the original one. The computation time and accuracy of each modifi- 
cation are discussed. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Juli 


A Simple Speech Synthesizer—D. J. Woilons and A. M. R. Gil 
(in English). (Hlectronic Tech., vol. 37, p. 373; October, 1960.) 


Intelligible speech is synthesized on the basis of reproducing tw: 
format frequencies, in the ranges 200-1200 cps and 1000-2400 eps, 
respectively, with fricative excitation. The format frequencies am 
controlled by two resonant circuits of which the damping is reduce 
by a valve amplifier and the frequency is controlled by using 
reverse-biased diode junction as the tuning capacitor. These resonant 
circuits are excited by an electrical noise source. 

On a two-coordinate plot, normal sounds are represented by 
points (indicating the values of two format frequencies) and diph: 
thongs are represented by trajectories on the two-coordinate plot 
The synthesizer is provided with a joy-stick control for varyin 
simultaneously the two formant frequencies. Intelligibility testi 
showed approximately 70 per cent correct identification for syn 
thesized single sounds and nearly 100 per cent for words, shor 
phrases and sentences. 


Synthesis of Speech-Recognizing Algorithm with Learning Abilities 
Application of the Golf Method (I; November 11, 1960)—S. Inomata. 


Consideration is given to the incorporation of learning processes 
into the SNCS speech-recognizing algorithm (see above). Higher- 
order learning processes, in which the mode of operation is changed, 
are excluded from this first approach. The so-called “inner parameter 
space’ is described, and the learning process is formulated as a 
difficult problem in nonlinear programming. In order to solve this 
problem, a powerful method of nonlinear programming, the ‘Golf 
Method,” is proposed and applied. 


A Vocoder for Voice Research (A; September 8, 1960)—S. Inoue. 


A Topological Approach to the Construction of Group Codes (J; 
January 17, 1961)—T. Kasami. 


This paper presents a systematic procedure for finding quasi- 
perfect group codes with given m and d, where m is the number of 
parity-check digits and d is the nearest-neighbor distance. This 
procedure may be suitable for digital-computer programming. In 
particular, in the case where d < 5, at least one quasi-perfect group: 
code can be obtained through the first few steps. 

The paper also proposes a topological method of group-code 
construction which is based on the above-mentioned procedure. 
For moderate values of m, quasi-perfect codes with d = 5 can be 
obtained rather easily through this method. Four examples are 
given. j 


On a Few Problems of Analog-to-Digital Converters (A; January 
16, 1961)— O. Kawatori and A. Kitamura. . 


FM-Like Characteristics of the Fundamental Frequencies of Speech 


Sounds (1; October 21, 1960)—T. Koshikawa. 


Economics of Coding in Parts Manufacturing (1; December 16) 
1960)— H. Kubokoya. 


‘ 
, 
: 


On the Precise Measurements of the Difference Between Two. 
Velocities (A; January 16, 1960)—Y. Matsumoto and N. Tatsuta. 
4 


1 


A General Statistical Theory of Noise Measurements (I; January. 
17, 1961)—M. Ota and M. Nakagami. 


This paper describes certain basic theories and properties con- 
nected with the measurement of noise through detector circuits. 


1 Abstracts 


oding of Japanese Monosyllables (A; January 16, 1961)— 
Sakai, ef al. 


> Basic Design of Pattern-Recognition Apparatus (A; January 16, 
1)—T. Sakai and T. Fukinuki. 


ications of Miyakawa’s Multidimensional Sampling Theorem— 
I; September 30, 1960)—K. Sasakawa. 


Coding for an Automatic Reading Apparatus (I; January 17, 
1)—S. Shirai and H. Sakaguchi. 


. description is presented of how to code alphanumerical char- 
ers read by an automatic reading apparatus. Pulses resulting 
n scanning letters vertically are distinguished as short, medium 
ong, according to length. The number of pulses in each scanning 
counted, and from this, characteristic patterns of the letters 
obtained. These patterns are distinguished into groups by several 
eria, and the pattern of a scanned letter is compared with 
ndard patterns. It was found that the letter could be identified 
hin a tolerable margin by associating it with the standard pattern 
h the smallest number of lack of coincidences with it. 


Miodel for the Transmission of Speech by Recognition (I; Novem- 
11, 1960, and A; December 8, 1960)—G. Suzuki and K. Nagata. 


\ simple preliminary model of an efficient speech-transmission 
tem using recognition is presented. This model is limited to the 
nsmission of the “phonetic quality’’ of speech, 7.e., the informa- 


201 


tion necessary to identify a speech sound as a linguistic code, and 
not the “vocal quality,’ which provides information regarding 
emotion and personality. The model can recognize vuwels in C-V- 
type syllables, code the decision into teletype signals for transmission, 
and reproduce the vowels at the receiving end by a speech synthe- 
sizer. The recognition scheme is simply based on a frequency analysis 
of the input speech wave; the envelope of the input speech provides 
supplemental information on timing. A series of recognition tests 
reveals that this simple model can recognize vowels in C-V-type 
syllables with an accuracy of 100-80 per cent for a single male 
speaker, and has an average score of about 70-50 per cent for a 
group of 5-9 male speakers. 


On the Number of Types of Self-Dual Logical Functions (1; Decem- 
ber 16, 1960)—I. Toda. 


Formulas for the number of self-dual logical functions of n vari- 
ables and for the number of their symmetry types are derived with 
the aid of a modified Slepian method. The numbers are tabulated 
for the cases of six or less variables. 


The Optimal Filter in the Phase-Locked FM Demodulator (I; 
January 17, 1961)—T. Tsumura and 8. Kobayashi. 


The optimal filter for a phase-locked FM demodulator is deter- 
mined using Wiener’s least-mean-square method and the following 
design criteria: 1) the noise of the signal due to noise interference 
should be minimized, and 2) the transient error between the output 
and the desired operation on the input, for a specific input, should be 
maintained at a specified level. 


The following papers appear in the “Transactions of the Second Prague Conference on Information Theory, Statistical 
Decision Functions, and Random Processes (June 1-6, 1959). They were published by the Publishing House of the 
Czechoslovak Academy of Sciences, Prague, Czechoslovakia, 1960. The affiliations of the authors are given below, as are 


some abstracts. 


sakly Markov Queues—YV. E. Benes (in English). (Bell Telephone 
bs.) 


\ stochastic process W(t) describes the length of time a customer 
uld have to wait in a queue if he arrived at time ¢. There is one 
ver, service is in order of arrival, and there are no defections. 
ne mild assumptions of stationarity are formulated, and some 
.eralized irrelevance or ‘‘weak Markov” conditions are described; 
s hoped that these will be helpful in problems other than queueing. 
e usefulness of the new approach is illustrated by showing that 
yields close general analogs of results previously known only for 
: special case of Poisson arrivals and independent service times. 


Random Solutions of Integral Equations in Banach Spaces— 
T. Bharucha-Reid (in English). (Univ. of Oregon and Math. 
{. of the Polish Acad. Sci., Wroclaw, Pcland.) 


Jur purpose in this paper is to study some problems in what 
xht be called the theory of random operator equations. We begin 
s with the study of random operator equations in Banach spaces, 
h particular interest in the existence and measurability of the 
dom resolvent operator associated with a random operator. We 
m present a general discussion of the stochastic boundary value 
yblem. After a brief discussion of Orlicz spaces and their properties, 
1 generalized random variables with values in an Orlicz space, 

consider random Fredholm integral equations of the second 
d in Orlicz spaces. The existence of a random solution of the 
dholm integral equation is established. Finally, we discuss 
ults obtained for other Banach spaces and mention several 
er integral equations that are being studied within the framework 
probabilistic functional analysis. 


ite-State Channels—L. Breiman (in English). (Univ. of Cali- 
nia, Los Angeles.) 


‘inite-state channels form an elegant and simple generalization 
yero-memory channels. The fundamental theorem of information 


theory (Shannon’s theorem) has previously been proven for finite- 
state channels under the restriction that the channel be indecom- 
posable. This condition, however, is quite restrictive and difficult 
to verify. In this paper we first redefine the channel capacity so 
that Shannon’s theorem holds for the general finite-state channel. 
We then proceed to the question of when channels may be con- 
sidered as being decomposable into subchannels, and give a simple 
answer. Finally, we derive a number of inequalities to facilitate the 
actual computation of channel capacity. 


A Relative Limit Theorem for Parabolic Functions—J. L. Doob 
(in English). (Univ. Of Illinois.) 


Convergence of Compact Measures on Metric Spaces—M. Driml 
(in English). (Czechoslovak Acad. Sci., Prague.) 


On Experience Theory Problems—M. Driml and O. Hans (in 
English). (Czechoslovak Acad. Sci., Prague.) 


The present paper aims at a formulation of the typical experience 
theory problem, which is a natural consequence of the detailed 
study of common features occurring in many special cases. Although 
we do not propose a clear-cut definition of experience theory, we 
emphasize that its object is to improve gradually our decision 
procedure by utilizing past experience obtained from results of 
experimentation and observation. Four theorems are stated in 
which a special case dealing with continuous time is solved. Three 
of them assume delay in construction of the decision process, and 
the fourth works without any delay. 


Continuous Stochastic Approximations—M. Driml and O. Hans 
(in English). (Czechoslovak Acad. Sci., Prague.) 


Roughly speaking, this paper deals with a continuous stochastic 
approximation method or, a little more precisely, with the continuous 
and probabilistic analog of the classical fixed-point theorem for 


202 IRE 
separable Banach spaces. The most important feature of the original 
Robbins-Munro stochastic approximation process can be expressed 
as follows: at each time instant a single experiment is performed, 
the level of which has been determined previously on the basis of 
prior outcomes only. Considering this feature, we aim at defining 
such a procedure which approximates continuously the fixed point 
of the expected value of a stationary ergodic stochastic process with 
values in a separable Banach space, utilizing at each instant only 
past history with a constant positive delay for the choice of level. 
The result, together with its discrete analog, is given. 


Conditional Expectations for Generalized Random Variables— 
M. Driml and O. Han§ (in English). (Czechoslovak Acad. Sci., 
Prague.) 


Stochastic Approximations for Continuous Random Processes— 
M. Driml and J. Nedoma (in English). (Czechoslovak Acad. Sci., 
Prague.) 


The theory of stochastic approximations was founded by H. 
Robbins and 8. Munro. The main problem considered by this theory 
is to find some characteristic point (namely, the zero point or point 
of minimum value) of the regression function of a one-parameter 
system of random variables. The regression function is assumed 
to be unknown and the characteristic point is determined by 
sequential approximation in such a way that random samples 
are taken from populations with distribution functions, the param- 
eter of which is based on results of foregoing samples. The approxi- 
mations of the characteristic point are obtained step by step. 

The question of the extension of the method of stochastic approxi- 
mations to the continuous cases arises. There are different possi- 
bilities of how to define continuous stochastic approximations. 
However, the existence of analog computers is a reason for seeking 
a method which enables the use of these computers; such a method 
is discussed in this paper. 


Problems of Statistics Related to Markov Processes—R. M. Fortet 
(in French). (Univ. of Paris.) 


The following general problem is considered in the special case 
of a Markov process. Being given a random function X(t), defined 
on an interval of time (0, ¢), the observation which would obtain 
the maximum possible information would be the continuous observa- 
tion of X(t) on (0, ¢); but very often it would be easier to proceed 
with a periodic discrete observation on X(t), of period 7’, at the 
InstamtslOyl2 Ate tam eels > (n — 1)T (with nT = 1); it is then 
interesting to evaluate the loss of information which is entailed in 
such a periodic discrete observation with respect to a continuous 
observation. 


On a Problem in the Theory of Queueing—B. V. Gnedenko (in 
Russian). (Ukrainian Acad. Sci., Kiev.) 


Up to the present time, as far as the author knows, the possibility 
of the dropping out of the operating state of the serving equipments 
has not been considered in queueing theory. In the present 
paper, we consider one of the basic problems in queueing theory with 
regard for this possibility. Consideration is limited to the case 
where a demand, finding all equipments taken or in nonworking 
condition, quickly disappears. Two cases are investigated: 1) if an 
equipment fails during a time of service, the demand being serviced 
disappears even under the condition that there are other free equip- 
ments, 2) if an equipment fails during a time of service, but there 
is a free equipment, the demand from the equipment which failed 
is transferred to a free equipment and service continues. The prob- 
ability that & equipments are serving demands at instant ¢ is cal- 
culated in these two cases, and the limit of this probability ast > © 
is evaluated. 


On a Simple Linear Model in Gaussian Processes—J. Hajek (in 
English). (Czechoslovak Acad. Sci., Prague.) 


This paper contains 1) proof of the existence of a random variable 
which is a sufficient statistic for the linear model, 2) a criterion for 
the regular case, and 3) a method of finding in the regular case the 


f TRANSACTIONS ON 


INFORMATION THEORY July 


sufficient statistic mentioned in 1). hese results are applied to 
processes with independent increments, to Markov processes and 
to stationary processes. 


. 


An Elementary Convergence Theorem—O. Han8 (in English), 
(Czechoslovak Acad. Sci., Prague.) 


Random Fixed Point Approximation by Differentiable Trajectories 4 
O. Hang and A. Spaéek (in English). (Czechoslovak Acad. Sei. 
Prague.) 4 


The Entropy of the Swedish Language—H. Hansson (in English). 
(Tel. AB. L. M. Ericsson, Stockholm.) 


Two methods suggested by C. Shannon have been used to deter- 
mine the entropy of the Swedish language. The results are in rather 
good accordance with those obtained by others for English and 
German. 


An Electronic Generator of Random Sequences—J. Havel (in 
English). (Czechoslovak Acad. Sci., Prague.) : 


In this paper an electronic generator of random sequences is 
described. First, the basic principle of the source of the random 
process and its transformation into a binary sequence of pulses 
with probability } — 4 is explained. Then the generator itself is 
described and block diagrams are given. Also presented is a descrip- 
tion of a unit for converting the waveform into a continuous sta- 
tionary Gaussian process. 


On the Capacity of Periodic and Almost-Periodic Channels 
Jacobs (in German). (Univ. of Gottingen.) 


This paper concerns itself with the so-called coding theorem. The 
Khinchin coding theorem for stationary channels is explained, and 
the Khinchin proof analyzed. It is then shown that for periodic and 
almost-periodic channels a corresponding coding theorem is obtained. 


Explicit Formulas for the Extrapolation, Filtering and Computation 
of Information Content in the Theory of Gaussian Stochastic 
Processes—A. M. Yaglom (in Russian). (Acad. Sci. USSR, Moscow.) 


A survey is given of some recent investigations related to two 
fields of the theory of probability—the theory of extrapolation and 
filtering, and the theory of information. Such a union of two 
apparently diverse fields is shown to have a definite basis, rathe C 
than being merely an artifice. 


Some Properties of Markov Chains Added Modulo k—Z. Koutsky 
(in German). (Czechoslovak Acad. Sci., Prague.) 


Let a random variable be defined as the sum modulo k of the 
first n elements in a Markov chain. The properties of this random 

variable are studied; in particular, necessary and sufficient con 
tions are given, that asm — © all k values that this random variable 
may assume now become equally probable. 7 


Necessary Convergence Conditions for Martingales and Related 
Processes—K_ Krickeberg (in German). (Univ. of Heidelberg.) 

‘ 
On a Characterization of the Wiener Process—R. G. Laha and 
Ey. Lukacs (in English). (Cathelic Univ. of America, Washington, 
IDE XC.) d 


The following theorem is proved. Let X(t) be a stochastic process 
defined in a finite closed interval [A, B], and let the process be 
homogeneous with independent increments, and of second order 
with mean value function and covariance function of bounded 
variation in [A, B]. Let a(t) and 6(t) be two continuous functions 
defined in [A, B] such that a(t)b(t) # O for all t «{A, Bi] where 
A < A, < B, < B, and suppose a(¢) is not proportional to 6(¢). 
Further, let Y = [42 a(t) dX(t) and Z = = {42 b(t) dX(t) be two 
Stochasne integrals, defined as limits in the mean. Then process X(t) 
is a Wiener process if, and only if, 1) Y has linear regression on Z, 
and 2) the conditional variance of Y, given Z, does not depend on Z. 


61 


1 Some Connections of the Information Quantities of C. Shannon 
d R. Fisher with the Theory of Summation of Random Vectors— 
1. ¥. Linnik (in Russian). (Univ. of Leningrad.) 


In the present work, some connections are established between 
e two concepts of quantity of information of Shannon and Fisher. 
ith their help, a purely information-theoretic proof is success- 
ly constructed of the central limit theorem for random vectors 
der the Lindeberg condition. 


ie Limit Properties of the Probability Distributions of Bounded 
arkov Processes—P. Mandl (in French). (Czechoslovak Acad. 
i., Prague.) 


The present work contains the results of a study of the approach 

the stationary state of the homogeneous Markov processes 
scribing diffusion bounded by one or two barriers. Different types 
barriers are considered. The particle which arrives at the barrier 
vy be either absorbed or reflected or there may be an elastic 
rrier. Also presented is a theorem concerning diffusion without 
undaries. 


n Generalized Stochastic Processes—G. Marinescu (in French). 
omanian Acad, Sci., Bucharest.) 


n Measure Theory in Product Spaces—K. Matthes (in German). 
fumboldt Univ., Berlin.) 


n Nonergodic Channels—J. Nedoma (in English). (Czechoslovak 
sad. Sci., Prague.) 


Several different definitions of the capacity of a channel have 
ypeared. It can be defined as the upper bound of the number of 
urces which are transmissable through the channel with arbitrarily 
uall probability of error; this is the e capacity of the channel. 
Ne capacity may also be defined as the upper bound of information 
tes which are obtained for all sources on the input space of the 
annel; this is the R capacity. Shannon’s theorem may be proved 
t both these capacities, but the class of channels for which it 
‘lds is more restricted for the latter definition. On the other 
nd, the first part of Shannon’s theorem can be proved for all 
xodic sources with entropy rate less than the upper bound of 
formation rates obtained for those sources on the input of the 
annel which give an ergodic source-channel probability; this 
yper bound is called the HR capacity. 
The question of the relationship of these three capacities arises. 
‘om the definition of HR capacity and R& capacity, it follows 
imediately that these are equal for all channels which are ergodic 
the sense that for all ergodic sources the source-channel prob- 
lity is ergodic. Also, for a wide class of channels the ER capacity 
less than or equal to the e capacity, which is less than or equal 
the R capacity. Consequently, for ergodic channels all three 
pacities are equal. 
The aim of this paper is to analyze the validity of such relations 
r nonergodic channels. 
| 


symptotically Stationary Gaussian Random Processes Produced 
, Filtering of a Periodic Sequence of Pulses—C. Pantelopoulos 
1 French). (Czechoslovak Acad. Sci., Prague.) 


Let {R(t, r)} be a class of impulse responses of linear filters, 
nere 7 is a time constant. Conditions are given under which the 
iss of output processes of these filters in response to a periodid 
quence of random, equiprobable, +1 rectangular pulses, converges 
a stationary Gaussian process as 7 > ©. 


n Information Theory and Discernability in Statistical Decision 
oblems—A. Perez (in French). (Czechoslovak Acad. Sci., Prague.) 


This work is concerned with the general problem of the trans- 
issability of an information source through a communication 
annel in the case of abstract alphabets, the time parameter being 
cher discrete or continuous. The probabilistic concept of “‘discern- 
ility,’’ which serves as a starting point for the concept of “‘trans- 
issability,”’ is here enriched by its fusion with the idea of generalized 
k from the theory of statistical decision functions. After a brief 
scussion of the classical model of statistical decision, the concept 
discernability in decision problems is considered, both when there 


Abstracts 


203 


is and when there is not coding. Then the concept of transmiss- 
ability is introduced, which is followed by consideration of certain 
theorems in information theory from the viewpoint of transmiss- 
ability. 


Experience and the Information Drawn from it with the Aid of the 
Limit Laws of Probability Theory—A. Perez (in French). (Czecho- 
slovak Acad. Sci., Prague.) 


On the Spreading Process—A. Prékopa (in English). (Hungarian 
Acad. Sci., Budapest.) 


Suppose that in an abstract set a random point distribution of 
the Poisson type is given, and suppose that each random point 
generates a further random point distribution in the same space. 
Such a process is realized by the propagation of plants on the plane 
when the wind carries away the seeds. We thus have a time process 
of random point distribution. Such a process is called a spreading 
process, and is studied here. 


An Effective Method of Finding Bayes’ Solution—V. S. Pugachev 
(in Russian). (Acad. Sci. USSR, Moscow.) 


A general method of finding Bayes’ solution is given for an 
arbitrary loss function in the case when the random function being 
observed and estimated depends on a finite-dimensional random 
vector U, and the conditional distribution of the observed random 
function for any fixed value of the vector U is normal. This method 
gives the possibility under highly general conditions of finding 
optimal systems intended for the detection and reproduction of 
signals in the presence of additive noise. In particular cases, the 
method set forth yields earlier known methods of the determination 
of optimal systems for the case of additive normal noise. 


On the Existence of Entropy—C. Rajski (in English). (Polish Acad. 
Sci., Warsaw.) 


Let ¢ denote a random variable defined in Rj, f(x) its probability 
density function and H(é) its entropy. The following theorem is 
proved: The entropy exists provided 1) the pdf exists and is mono- 
tonic except for a finite interval, and 2) there exists such a positive 
number ¢ that the integral [_.° | x |* f(x) dx converges. 


The Pseudometric Space of Discrete Random Variables Defined 
Over a Group—C. Rajski (in English). (Polish Acad. Sci., Warsaw.) 


A proof is given that any set of all discrete random variables 
having a common sample space G is a pseudometric space with 
distance d(é, n) = H(& — ») provided G is a group. A pseudometric 
space differs from a metric space only in the respect that d = 0 
does not imply é = 7. 


Dimension, Entropy and Information—A. Rényi (in English). 
(Hungarian Acad. Sci., Budapest.) 


On Optimal Multistage Tests—H. Richter (in German). (Univ. 
of Munich.) 


Normalized «Entropy of Sets and the Theory of Transmission of 
Information—M. Rosenblatt-Roth (in Russian). (Parkhon Univ., 
Bucharest.) 


The author has previously studied nonstationary (or, as a special 
case, stationary) stochastic sources and channels with arbitrary 
sets of states, time being considered discrete. In the case where the 
sources and channel inputs possess discrete sets of states, but the 
output sets of states of the channels are arbitrary, the fundamental 
two theorems of Shannon were proved for regular sources and 
channels. 

In this work the question is posed of the approximation of sto- 
chastic nonstationary (or stationary) sources possessing continuous 
sets of states with stochastic sources possessing discrete sets of 
states, and also the question of the approximation of nonstationary 
(or stationary) stochastic channels possessing continuous sets of 
states at the input by stochastic channels possessing discrete sets 
of states at the input. The fundamental theorems of Shannon are 
considered in these conditions. 


204 


Relationships Between Information Theory and Decision-Function 
Theory—J. Seidler (in English). (Polish Acad. Sci., Gdansk, Poland.) 


The relations between information theory and decision-function 
theory are considered. A decision is called a type-1 decision if it is 
an estimator and a ete decision if it is an estimating subset. The 
concept of a first-stage transformation of the received signals before 
making a decision is introduced. Formulas of the Rao-Cramer type 
for decisions of the first and second type corresponding to the Bayes 
method are derived. General conclusions concerning applications of 
entropy and amount of information if coding is not considered are 
given. Finally, a new uniqueness theorem for the entropy functional 
is proved. 


Some Functionals in Processes—\V. 
(Lithuanian Acad. Sci., Vilnius.) 


Statuliavichus (in Russian). 


The applicability of the theorem of large deviations to non- 
homogeneous Markov chains with a finite number of possible 
states is studied, as are sequences of chains, the nth chain being one 
with n instants of time. 


Filters and Predictors which Adapt Their Values to the Unknown 
Parameters of the Input Process—O. Sefl (in English). (Czecho- 
slovak Acad. Sci., Prague.) 


This note is connected with the idea of a self-optimizing predictor 
which was described by L. Prouza in 1956. Prouza described a 
discontinuous predictor which adapts its parameters according to 
the measured coefficients of the input process. The main aim of this 
paper is to develop this idea for a continuous predictor which con- 
tinuously adapts its characteristic to the correlation function of 
the input process. 


Statistical Estimation of Provability in Boolean Logic—A. Spacek 
(in English). (Czechoslovak Acad. Sci., Prague.) 


Random Metric Spaces—A. Spacek (in English). (Czechcslovak 


Acad. Sci., Prague.) 


An abstract set together with a distance function defined for 
all pairs of its elements is said to be a rigid metric space. In view of 
various applications (in particular, in the fields of information 
theory and statistical decision processes, to the probabilistic concept 
of discernability of A. Perez), it is reasonable to replace the rigid 
metric by a random metric in order to obtain a random metric 
space. The properties of such spaces are considered here. 


IRE TRANSACTIONS ON 


INFORMATION THEORY Jul 


Random Mikusinski Operators—M. Ullrich (in English). (Czech 
slovak Acad. Sci., Prague.) 


After a discussion of the generalization of ordinary Mikusinski 
operators to the multidimensional case, a definition of a random 
Mikusinski operator is given, and its relation to ordinary random 
variables and stochastic processes shown. Then some new definitions, 
for random operators, of notions similar to those used in the theor 
of stochastic processes are given, and some fundamental propertie 
of random Mikusinski operators are proved. Finally, the notion of 
a random operator function is introduced, and this is used for the 
solution of random partial differential equations. 


A Representation Theorem for Random Schwartz Operators— 
M. Ullrich (in English). (Czechoslovak Acad. Sci., Prague.) 


A Contribution to the Theory of Generalized Stationary Random 
Fields—K. Urbanik (in English). (Polish Acad. Sci., Wroclaw, 
Poland.) 


Communication Channels with Finite Past History—K. Winkelbauer 
(in English). (Czechoslovak Acad. Sci., Prague.) 


In this paper, basic theorems which are valid for channels with 
finite past history are given, namely a theorem of the type of the 
Feinstein-Khinchin fundamental Jemma on discernability (the 
Coding theorem) and its converse, the direct and converse parts 
of a theorem of the Shannon type for transmission with arbitrarily 
small probabilities of error, theorems of the Shannon type for 
equivocation, the direct and converse parts of a theorem for trans- 
mission with arbitrarily small average frequencies of errors, theorems 
on transmission with arbitrarily small risks with respect to general 
weight functions, a theorem on transmission in the case of equality 
between entropy and capacity, and consequences of the latter 
theorems for the special case of channels with finite memory and, 
more generally, for the case of indecomposable channels. 


Fundamental Equations of the Theory of Pursuit—A. Zieba (in 
English). (Polish Acad. Sci., Wroclaw, Poland.) 


The minimax solution of the general problem of pursuit in the 
plane is presented for the case of one pursuer and one escaper. 
The result may be generalized for more than one pursuer and/or 
escaper as well as for pursuit in n-dimensional space. 


On Certain Infinitesimal Properties of Random Functions—F,. 
Zitek (in French). (Czechoslovak Acad. Sci., Prague.) 


In this paper, certain properties of random functions of interval, 
such as absolute continuity and differentiability, are studied, aad 
well as their relationships with the theory of limit laws and ‘he 
theory of stochastic differential equations. 


eee. Te 


IG 1 


300k Reviews 


Statistical Theory of Communication—Y. W. Lee 
john Wiley and Sons, Inc., New York, N. Y.; 1960. 
I9 Pages. $16.75) 


Dr. Lee’s book is an excellent self-integrated book on 
ne first principles of statistical theory of communication, 
hich in the author’s usage, means linear least-mean- 
quare filtering and related subjects. The level of material 
; accurately stated in the preface to be that of first-year 
raduate students or advanced seniors. It is presently 
eing used at M. I. T. in a first-year graduate class, and 
hould be quite understandable and easy to use in various 
smunars. 

Material such as is presented in this book is absolutely 
ssential to a clear understanding of closely refined com- 
qunication and tracking systems operating near maximum 
ficiency; 7.e., operating near threshold. It is a particularly 
ood introduction to statistical communication in that 
> will equip the reader with the ability to attack more 
ifficult material later on. 

The book should be fairly easy for communication 
ngineers to read, as the material is motivated by Lee’s 
aterest in the filtering of noise from communication 
‘gnals. His experimental results contaimed in the latter 
hapters are excellent pieces of motivation. 

It is only fair to warn the prospective teacher or student 
hat even the elements of statistical communication theory 
re by no means simple to understand thoroughly. The 
tudent must be used to thinking in integral and dif- 
srential equations. The book averages four or five equa- 
ions per page and these equations are used as part of 
he normal progression of text material. In other words, 


IRE TRANSACTIONS ON INFORMATION THEORY 


205 


skipping over the equations without understanding them 
would be like skipping every other paragraph in a novel. 

A teacher will probably find it necessary to make up 
problems for Chapters 3-5, 10, 11, 18, and 19 in order 
to fix in the student’s mind the exact relationships in- 
volved in probability theory. The material on calculus 
of variations definitely should be supplemented either by 
other course work or by special notes, Inasmuch as certain 
communications problems are particularly well handled 
by the application of the calculus of variations to the 
Weiner optimization. 

The book should also prove useful to engineers con- 
fronted with the measurement problem of statistical 
data. Chapters 11 and 12 of the book are particularly 
interesting in showing that the measurement of statistical 
data produces answers which are only probabilistic in 
nature. The book shows how to treat measurement 
problems with theory that can be mastered by most 
instrumentation engineers. The last two chapters (18 and 
19) describe some work with orthogonal functions and 
their generation with simple linear networks. This ap- 
proach is particularly interesting because it shows simple 
ideas which should have good application of the repre- 
sentation of signals through orthogonal functions. Chapter 
13 on the transfer characteristic of lear systems has 
proven quite valuable in advanced development work in 
the communications field. 

EK. REcHTIN 

C. 8. LorEns 

Jet Propulsion Lab. 
California Inst. Tech. 
Pasadena, Calif. 


Affiliates of the IRE Professional Group 
on Information Theory 


The Affiliate plan, established by the IRE Board of Directors, enables individuals who are not members 
of IRE, but have an interest in the fields of information transmission, processing and utilization, to join 
PGIT without the expense of joining the parent IRE body. Admission as an Affiliate merely requires 1) 
present membership in a professional society listed below, 2) non-membership in the IRE during the past 
five years, and 3) payment of $8.50 per year. Membership in both IRE and PGIT will cost $14.00. Current 
affiliated societies are: 


American Psychological Association Modern Language Association of American 

American Statistical Association Society for Industrial and Applied Mathematics 

The Institute of Mathematical Statistics American Association for the Advancement of Science 
American Documentation Institute American Astronomical Society 

American Geophysical Union American Chemical Society 

American Institute of Physics and its member Societies: American Management Association 


American Meteorological Society 
American Rocket Society 

American Society of Mechanical Engineers 
Association for Computing Machinery 
Audio Engineering Society 


Acoustical Society of America 

American Physical Society 

American Association of Physics Teachers 
Optical Society of America 


American Mathematical Society Institute of the Aeronautical Sciences 

American Speech and Hearing Association Instrument Society of America 

Linguistic Society of America Operations Research Society of America 

Linguistic Circle of New York Society of Motion Picture and Television Engineers 


How an Individual May Become an Affiliate of PGIT 


Fill out the application blank below. Detach the completed application and mail with your check for 
$8.50 to the Institute of Radio Engineers, 1 East 79 Street, New York 21, N. Y. 


The Institute of Radio Engineers, Inc., 1 East 79 Street, New York 21, N. Y. 
PROFESSIONAL GROUP ON INFORMATION THEORY—APPLICATION FOR AFFILIATES 


To affiliate with PGIT; indicate the affiliated societies of which you are a member (see above list), 
and remit the assessment shown below to IRE Headquarters with this application. 


Information requested below MUST be furnished. 
Name (Please: Print)... 0.0 oi eae Wie eid RS ora ye ee 0g ce Pk Le 


Mailing’ Address ic. cceces stein Shee eu fetes cies ee es re 


I have not been a member of the IRE for the past five years. 


I enclose $8.50 which is the publications assessment for PGIT. 


(Signed) ../-eecsac ns hn «uy ote Ben ae 


A STATEMENT OF EDITORIAL POLICY 


The IRE Transactions oN INFORMATION THEORY is a quarterly journal devoted 
to the publication of papers on the transmission, processing, and utilization of 
information. The exact subject matter of acceptable papers is intentionally, by 
editorial policy, not sharply delimited. Rather, it is hoped that as the focus of 
research activity changes, a flexible policy will permit the TRANSACTIONS to 
follow suit and that it will continue to serve its readers with timely articles on 
the fundamental nature of the communication process. Topics of current appro- 
priateness include the coding and decoding of digital and analog communication 
transmissions, studies of random interferences and of information bearing signals, 
analyses and Hear of communication and detection systems, pattern neces are 
learning, automata, and other forms of information processing systems. 

Papers can be of two kinds, tutorial or research, and should be so indicated. 
The former must be well-written expositions summarizing the state of a field in 
which research is still in progress, or else unify results scattered in the literature. 
Research papers must be original contributions not published elsewhere. They 
must present new methods, concepts, or ideas, or extend old ones to new areas 
of applicability; or, they must present new data, findings or inventions, or solve 
new problems of more than casual interest. They will not be accepted if, in the 
view of the reviewers and editors, they constitute a straightforward and easy 
application of existing theory to a special case of limited interest. It is not necessary 
that the length of each research paper be great; on the contrary, the submission 
of short but formal research notes is to be encouraged. 

In addition to papers, readers are invited to submit notes to the Correspondence 
section. These may include such things as early summaries of important work to 
be published later at greater length, or remarks on material that has already 
appeared. Contributions in the form of “‘problem statements” are also sought for 
the Correspondence section. This category includes problems to which the author 
knows no solution but suspects that another reader might, conjectures for which 
a proof or disproof is desired, and so forth. 


INFORMATION FOR AUTHORS 


Authors are requested to submit editorial correspondence or technical manu- 
scripts to the Editor for possible publication in the PGIT TRANSACTIONS. Papers 
submitted should include a statement as to whether the material has been copy- 
righted, previously published, or submitted for publication elsewhere. 

To expedite reviewing procedures, it is requested that authors submit the 
original and two legible copies of all written and illustrative material. The manu- 
script should be double-spaced, and the illustrations drawn in India ink on drawing 
paper or drafting cloth. Each paper should include a carefully written abstract 
of not more than 200 words. Papers should be prepared for publication in a manner 
similar to those intended for the ProcrErpinGs oF THE IRE. Further instructions 
may be obtained from the Editor. The original copy and drawings of material not 
accepted for publication will be returned. 

All technical manuscripts and editorial correspondence should be addressed to 
Arthur Kohlenberg, Melpar, Inc., 11 Galen Street, Watertown 72, Mass. 

Local Chapter activities and announcements, as well as other nontechnical 
news items, should be addressed to the PGIT Newsletter, c/o Prof. N. M. Abramson, 
Electrical Engineering Department, Stanford University, Stanford, Calif. 


INSTITUTIONAL LISTINGS 


The IRE Professional Group on Information Theory is grateful for the assistance 
given by the firms listed below and invites application for Institutional Listing 


from other firms interested in the field of Information Theory. 


REPUBLIC AVIATION CORP., Farmingdale, N. Y. 
Advanced Aircraft, Space Systems, Missile Systems and Electronics 


a 


The ‘charge for an Institutional Listing is $50 per issue or $150 for four con- 
secutive issues. Applications for Institutional Listings and checks (made pay- 
able to the Institute of Radio Engineers) should be sent to L. G. Cumming, 
Institute of Radio Engineers, 1 East 79 Street, New York 21, N. Y. 


